Skip to content

Commit b2b6f21

Browse files
authored
Merge pull request #243 from hackalog/dev
Release to Main
2 parents 411a53f + ad6ead4 commit b2b6f21

33 files changed

+725
-390
lines changed

.circleci/config.yml

Lines changed: 15 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ jobs:
88
docker:
99
# specify the version you desire here
1010
# use `-browsers` prefix for selenium tests, e.g. `3.6.1-browsers`
11-
- image: cimg/python:3.8.0
11+
- image: continuumio/miniconda3
1212

1313
# Specify service dependencies here if necessary
1414
# CircleCI maintains a library of pre-built images
@@ -19,39 +19,38 @@ jobs:
1919

2020
steps:
2121
- checkout
22-
22+
2323
- run:
24-
name: Set up Anaconda
24+
name: Set up Conda
2525
command: |
26-
wget -q http://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh;
27-
chmod +x ~/miniconda.sh;
28-
~/miniconda.sh -b -p ~/miniconda;
29-
export PATH=~/miniconda/bin:$PATH
30-
echo "export PATH=~/miniconda/bin:$PATH" >> $BASH_ENV;
31-
conda update --yes --quiet conda;
3226
conda init bash
33-
sed -ne '/>>> conda initialize/,/<<< conda initialize/p' ~/.bashrc >> $BASH_ENV
34-
27+
conda update --yes --quiet conda;
28+
export CONDA_EXE=/opt/conda/bin/conda
29+
sed -ne '/>>> conda initialize/,/<<< conda initialize/p' ~/.bashrc >> $BASH_ENV
30+
3531
- run:
3632
name: Build cookiecutter environment and test-env project
3733
command: |
38-
conda create -n cookiecutter --yes python=3.8
34+
conda create -n cookiecutter --yes python=3.8 make
3935
conda activate cookiecutter
4036
pip install cookiecutter
4137
pip install ruamel.yaml
42-
mkdir /home/circleci/.cookiecutter_replay
43-
cp circleci-cookiecutter-easydata.json /home/circleci/.cookiecutter_replay/cookiecutter-easydata.json
38+
mkdir -p /root/repo/.cookiecutter_replay
39+
cp circleci-cookiecutter-easydata.json /root/repo/.cookiecutter_replay/cookiecutter-easydata.json
4440
pwd
41+
which make
4542
cookiecutter --config-file .cookiecutter-easydata-test-circleci.yml . -f --no-input
46-
conda deactivate
4743
4844
- run:
4945
name: Create test-env environment and contrive to always use it
5046
command: |
47+
conda activate cookiecutter
5148
cd test-env
52-
export CONDA_EXE=/home/circleci/miniconda/bin/conda
49+
export CONDA_EXE=/opt/conda/bin/conda
5350
make create_environment
51+
python scripts/tests/add-extra-channel-dependency.py
5452
conda activate test-env
53+
conda install -c anaconda make
5554
touch environment.yml
5655
make update_environment
5756
echo "conda activate test-env" >> $BASH_ENV;

.travis.yml

Lines changed: 0 additions & 51 deletions
This file was deleted.

README.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,24 @@ python -m pip install -f requirements.txt
5151

5252
cookiecutter https://github.com/hackalog/easydata
5353

54+
### To find out more
55+
------------
56+
A good place to start is with reproducible environments. We have a tutorial here: [Getting Started with EasyData Environments](https://github.com/hackalog/easydata/wiki/Getting-Started-with-EasyData-Environments).
57+
58+
The next place to look is in the customized documentation that is in any EasyData created repo. It is customized to the settings that you put in your template. These are reference documents that can be found under `references/easydata` that are customized to your repo that cover:
59+
* more on conda environments
60+
* more on paths
61+
* git configuration (including setting up ssh with GitHub)
62+
* git workflows
63+
* tricks for using Jupyter notebooks in an EasyData environment
64+
* troubleshooting
65+
* recommendations for how to share your work
66+
67+
Furthermore, see:
68+
* [The EasyData documentation on read the docs](https://cookiecutter-easydata.readthedocs.io/en/latest/?badge=latest): this contains up-to-date working exmaples of how to use EasyData for reproducible datasets and some ways to use notebooks reproducibly
69+
* [Talks and Tutorials based on EasyData](https://github.com/hackalog/easydata/wiki/EasyData-Talks-and-Tutorials)
70+
* [Catalog of EasyData Documentation](https://github.com/hackalog/easydata/wiki/Catalog-of-EasyData-Documentation)
71+
* [The EasyData wiki](https://github.com/hackalog/easydata/wiki) Check here for further troubleshooting and how-to guides for particular problems that aren't in the `references/easydata` docs (including a `git` tutorial)
5472

5573
### The resulting directory structure
5674
------------

cookiecutter.json

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
{
22
"project_name": "project_name",
33
"repo_name": "{{ cookiecutter.project_name.lower().replace(' ', '_') }}",
4-
"default_branch": ["master", "main"],
4+
"default_branch": ["main", "master"],
55
"module_name": "src",
6-
"author_name": "Your name (or your organization/company/team)",
6+
"author_name": "Your name (or the copyright holder)",
77
"description": "A short description of this project.",
88
"open_source_license": ["MIT", "BSD-2-Clause", "Proprietary"],
9-
"python_version": ["3.7", "3.6", "latest", "3.8"],
9+
"python_version": ["latest", "3.11", "3.10", "3.9", "3.8", "3.7"],
1010
"conda_path": "~/anaconda3/bin/conda",
1111
"upstream_location": ["github.com", "gitlab.com", "bitbucket.org", "your-custom-repo"]
1212
}

docs/00-xyz-sample-notebook.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -150,7 +150,7 @@
150150
"metadata": {},
151151
"outputs": [],
152152
"source": [
153-
"print(ds.DESCR)"
153+
"print(ds.README)"
154154
]
155155
},
156156
{

docs/Add-csv-template.ipynb

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,7 @@
8383
"* `csv_path`: The desired path to your .csv file (in this case `epidemiology.csv`) relative to paths['raw_data_path']\n",
8484
"* `download_message`: The message to display to indicate to the user how to manually download your .csv file.\n",
8585
"* `license_str`: Information on the license for the dataset\n",
86-
"* `descr_str`: Information on the dataset itself"
86+
"* `readme_str`: Information on the dataset itself"
8787
]
8888
},
8989
{
@@ -123,7 +123,7 @@
123123
"metadata": {},
124124
"outputs": [],
125125
"source": [
126-
"descr_str = \"\"\"\n",
126+
"readme_str = \"\"\"\n",
127127
"The epidemiology table from Google's [COVID-19 Open-Data dataset](https://github.com/GoogleCloudPlatform/covid-19-open-data). \n",
128128
"\n",
129129
"The full dataset contains datasets of daily time-series data related to COVID-19 for over 20,000 distinct locations around the world. The data is at the spatial resolution of states/provinces for most regions and at county/municipality resolution for many countries such as Argentina, Brazil, Chile, Colombia, Czech Republic, Mexico, Netherlands, Peru, United Kingdom, and USA. All regions are assigned a unique location key, which resolves discrepancies between ISO / NUTS / FIPS codes, etc. The different aggregation levels are:\n",
@@ -170,7 +170,7 @@
170170
" csv_path=csv_path,\n",
171171
" download_message=download_message,\n",
172172
" license_str=license_str,\n",
173-
" descr_str=descr_str,\n",
173+
" readme_str=readme_str,\n",
174174
" overwrite_catalog=True)"
175175
]
176176
},
@@ -206,9 +206,9 @@
206206
"cell_type": "markdown",
207207
"metadata": {},
208208
"source": [
209-
"By default, the workflow helper function also created a `covid-19-epidemiology_raw` dataset that has an empty `ds.data`, but keeps a record of the location of the final `epidemiology.csv` file relative to in `ds.EXTRA`.\n",
209+
"By default, the workflow helper function also created a `covid-19-epidemiology_raw` dataset that has an empty `ds.data`, but keeps a record of the location of the final `epidemiology.csv` file relative to in `ds.FILESET`.\n",
210210
"\n",
211-
"The `.EXTRA` functionality is covered in other documentation."
211+
"The `.FILESET` functionality is covered in other documentation."
212212
]
213213
},
214214
{
@@ -236,7 +236,7 @@
236236
"metadata": {},
237237
"outputs": [],
238238
"source": [
239-
"ds_raw.EXTRA"
239+
"ds_raw.FILESET"
240240
]
241241
},
242242
{
@@ -246,7 +246,7 @@
246246
"outputs": [],
247247
"source": [
248248
"# fq path to epidemiology.csv file\n",
249-
"ds_raw.extra_file('epidemiology.csv')"
249+
"ds_raw.fileset_file('epidemiology.csv')"
250250
]
251251
},
252252
{

docs/Add-derived-dataset.ipynb

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@
8585
"metadata": {},
8686
"outputs": [],
8787
"source": [
88-
"print(ds.DESCR)"
88+
"print(ds.README)"
8989
]
9090
},
9191
{
@@ -219,7 +219,7 @@
219219
" source_dataset_name\n",
220220
" dataset_name\n",
221221
" data_function\n",
222-
" added_descr_txt\n",
222+
" added_readme_txt\n",
223223
"\n",
224224
"We'll want our `data_function` to be defined in the project module (in this case `src`) for reproducibility reasons (which we've already done with `subselect_by_key` above)."
225225
]
@@ -250,7 +250,7 @@
250250
"metadata": {},
251251
"outputs": [],
252252
"source": [
253-
"added_descr_txt = f\"\"\"The dataset {dataset_name} is the subselection \\\n",
253+
"added_readme_txt = f\"\"\"The dataset {dataset_name} is the subselection \\\n",
254254
"to the {key} dataset.\"\"\""
255255
]
256256
},
@@ -281,7 +281,7 @@
281281
" source_dataset_name=source_dataset_name,\n",
282282
" dataset_name=dataset_name,\n",
283283
" data_function=data_function,\n",
284-
" added_descr_txt=added_descr_txt,\n",
284+
" added_readme_txt=added_readme_txt,\n",
285285
" overwrite_catalog=True)"
286286
]
287287
},
@@ -318,7 +318,7 @@
318318
"metadata": {},
319319
"outputs": [],
320320
"source": [
321-
"print(ds.DESCR)"
321+
"print(ds.README)"
322322
]
323323
},
324324
{

docs/New-Dataset-Template.ipynb

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -167,7 +167,7 @@
167167
"metadata": {},
168168
"source": [
169169
"### Create a process function\n",
170-
"By default, we recommend that you use the `process_extra_files` functionality and then use a transformer function to create a derived dataset, but you can optionally create your own."
170+
"By default, we recommend that you use the `process_fileset_files` functionality and then use a transformer function to create a derived dataset, but you can optionally create your own."
171171
]
172172
},
173173
{
@@ -176,11 +176,11 @@
176176
"metadata": {},
177177
"outputs": [],
178178
"source": [
179-
"from src.data.extra import process_extra_files\n",
180-
"process_function = process_extra_files\n",
179+
"from src.data.fileset import process_fileset_files\n",
180+
"process_function = process_fileset_files\n",
181181
"process_function_kwargs = {'file_glob':'*.csv',\n",
182182
" 'do_copy': True,\n",
183-
" 'extra_dir': ds_name+'.extra',\n",
183+
" 'fileset_dir': ds_name+'.fileset',\n",
184184
" 'extract_dir': ds_name}"
185185
]
186186
},
@@ -355,7 +355,7 @@
355355
"metadata": {},
356356
"outputs": [],
357357
"source": [
358-
"ds.EXTRA"
358+
"ds.FILESET"
359359
]
360360
},
361361
{
@@ -364,7 +364,7 @@
364364
"metadata": {},
365365
"outputs": [],
366366
"source": [
367-
"ds.extra_file('epidemiology.csv')"
367+
"ds.fileset_file('epidemiology.csv')"
368368
]
369369
},
370370
{

docs/New-Edge-Template.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,7 @@
8888
"metadata": {},
8989
"outputs": [],
9090
"source": [
91-
"source_ds.EXTRA"
91+
"source_ds.FILESET"
9292
]
9393
},
9494
{
@@ -178,7 +178,7 @@
178178
"metadata": {},
179179
"outputs": [],
180180
"source": [
181-
"print(ds.DESCR)"
181+
"print(ds.README)"
182182
]
183183
},
184184
{

docs/test_docs.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@
99
import requests
1010

1111
from src import paths
12+
from src.log import logger
13+
1214

1315
CCDS_ROOT = Path(__file__).parents[1].resolve()
1416
DOCS_DIR = CCDS_ROOT / "docs"
@@ -35,6 +37,7 @@ def test_notebook_csv(self):
3537
csv_url = "https://storage.googleapis.com/covid19-open-data/v2/epidemiology.csv"
3638
csv_dest = paths['raw_data_path'] / "epidemiology.csv"
3739
if not csv_dest.exists():
40+
logger.debug("Downloading epidemiology.csv")
3841
csv_file = requests.get(csv_url)
3942
with open(csv_dest, 'wb') as f:
4043
f.write(csv_file.content)

0 commit comments

Comments
 (0)