hackalog
diff --git a/‎README.md‎
Lines changed: 40 additions & 23 deletions b/‎README.md‎
Lines changed: 40 additions & 23 deletions
diff --git a/‎cookiecutter.json‎
Lines changed: 3 additions & 1 deletion b/‎cookiecutter.json‎
Lines changed: 3 additions & 1 deletion
diff --git a/‎{{ cookiecutter.repo_name }}/.post-create-environment.txt‎
Lines changed: 5 additions & 0 deletions b/‎{{ cookiecutter.repo_name }}/.post-create-environment.txt‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎{{ cookiecutter.repo_name }}/Makefile‎
Lines changed: 2 additions & 8 deletions b/‎{{ cookiecutter.repo_name }}/Makefile‎
Lines changed: 2 additions & 8 deletions
diff --git a/‎{{ cookiecutter.repo_name }}/Makefile.envs‎
Lines changed: 8 additions & 11 deletions b/‎{{ cookiecutter.repo_name }}/Makefile.envs‎
Lines changed: 8 additions & 11 deletions
diff --git a/‎{{ cookiecutter.repo_name }}/README.md‎
Lines changed: 11 additions & 13 deletions b/‎{{ cookiecutter.repo_name }}/README.md‎
Lines changed: 11 additions & 13 deletions
diff --git a/‎{{ cookiecutter.repo_name }}/environment.yml‎
Lines changed: 1 addition & 0 deletions b/‎{{ cookiecutter.repo_name }}/environment.yml‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎{{ cookiecutter.repo_name }}/framework-docs/conda-environments.md‎
Lines changed: 40 additions & 16 deletions b/‎{{ cookiecutter.repo_name }}/framework-docs/conda-environments.md‎
Lines changed: 40 additions & 16 deletions
@@ -4,15 +4,30 @@
 
 # Cookiecutter EasyData
 
-_A flexible (but opinionated) toolkit for doing and sharing reproducible data science._
+_A python framework and git gemplate for data scientists, teams, and workshop organizers
+aimed at making your data science **reproducible**__
 
-EasyData started life as an experimental fork of
-[cookiecutter-data-science](http://drivendata.github.io/cookiecutter-data-science/)
-where we could try out ideas before proposing them as fixes to the upstream branch. It has grown into its own toolkit for implementing a reproducible data science workflow, and is the basis of our [Bus Number](https://github.com/hackalog/bus_number/) tutorial on **Reproducible Data Science**.
+For most of us, data science is 5% science, 60% data cleaning, and 35%
+IT hell.  Easydata focuses on delivering
+* reproducible python environments,
+* reproducible datasets, and
+* reproducible workflows
+in order to get you up and running with your data science quickly, and reproducibly.
 
-### Tutorial
-For a tutorial on making use of a previous version of this framework (available via the `bus_number` branch), visit:
-  https://github.com/hackalog/bus_number/
+## What is Easydata?
+
+Easydata is a python cookiecutter for building custom data science git repos that provides:
+* An **opinionated workflow** for collaboration, storytelling,
+* A **python framework** to support this workflow
+* A **makefile wrapper** for conda and pip environment management
+* prebuilt **dataset recipes*, and
+* a vast library of training materials and documentation around doing reproducible data science.
+
+Easydata is **not**
+* an ETL tooklit
+* A data analysis pipreline
+* a containerization solution, or
+* a prescribed data format.
 
 
 ### Requirements to use this cookiecutter template:
@@ -22,15 +37,12 @@ For a tutorial on making use of a previous version of this framework (available
 
  - [Cookiecutter Python package](http://cookiecutter.readthedocs.org/en/latest/installation.html) >= 1.4.0: This can be installed with pip by or conda depending on how you manage your Python packages:
 
-``` bash
-$ pip install cookiecutter
-```
-
-or
+once you've installed anaconda, you can install the remaining requirements (including cookiecutter) by doing:
 
-``` bash
-$ conda config --add channels conda-forge
-$ conda install cookiecutter
+```bash
+conda create -n easydata python=3
+conda activate easydata
+python -m pip install -f requirements.txt
 ```
 
 
@@ -54,6 +66,8 @@ The directory structure of your new project looks like this:
 * `catalog`
   * Data catalog. This is where config information such as data sources
     and data transformations are saved
+  * `catalog/config.ini`
+     * Local Data Store. This configuration file is for local data only, and is never checked into the repo.
 * `data`
     * Data directory. often symlinked to a filesystem with lots of space
     * `data/raw`
@@ -64,6 +78,8 @@ The directory structure of your new project looks like this:
         * The final, canonical data sets for modeling.
 * `docs`
     * A default Sphinx project; see sphinx-doc.org for details
+* `framework-docs`
+    * Markdown documentation for using Easydata
 * `models`
     * Trained and serialized models, model predictions, or model summaries
     * `models/trained`
@@ -86,6 +102,8 @@ The directory structure of your new project looks like this:
         * Generated summary information to be used in reporting
 * `environment.yml`
     * (if using conda) The YAML file for reproducing the analysis environment
+* `environment.(platform).lock.yml`
+    * resolved versions, result of processing `environment.yml`
 * `setup.py`
     * Turns contents of `MODULE_NAME` into a
     pip-installable python module  (`pip install -e .`) so it can be
@@ -95,15 +113,9 @@ The directory structure of your new project looks like this:
     * `MODULE_NAME/__init__.py`
         * Makes MODULE_NAME a Python module
     * `MODULE_NAME/data`
-        * Scripts to fetch or generate data. In particular:
-        * `MODULE_NAME/data/make_dataset.py`
-            * Run with `python -m MODULE_NAME.data.make_dataset fetch`
-            or  `python -m MODULE_NAME.data.make_dataset process`
+        * code to fetch raw data and generate Datasets from them
     * `MODULE_NAME/analysis`
-        * Scripts to turn datasets into output products
-    * `MODULE_NAME/models`
-        * Scripts to train models and then use trained models to make predictions.
-        e.g. `predict_model.py`, `train_model.py`
+        * code to turn datasets into output products
 * `tox.ini`
     * tox file with settings for running tox; see tox.testrun.org
 
@@ -128,3 +140,8 @@ In case you need to delete the environment later:
 conda deactivate
 make delete_environment
 ```
+
+
+## History
+Early versions of Easydata were based on
+[cookiecutter-data-science](http://drivendata.github.io/cookiecutter-data-science/).
@@ -1,10 +1,12 @@
 {
     "project_name": "project_name",
     "repo_name": "{{ cookiecutter.project_name.lower().replace(' ', '_') }}",
+    "default_branch": ["master", "main"],
     "module_name": "src",
     "author_name": "Your name (or your organization/company/team)",
     "description": "A short description of this project.",
     "open_source_license": ["MIT", "BSD-2-Clause", "Proprietary"],
     "python_version": ["3.7", "3.6", "latest", "3.8"],
-    "conda_path": "~/anaconda3/bin/conda"
+    "conda_path": "~/anaconda3/bin/conda",
+    "upstream_location": ["github.com", "gitlab.com", "bitbucket.org", "your-custom-repo"]
 }
@@ -1,3 +1,8 @@
+
+Now would be a good time to initialize a git repo; i.e.
+>>> git init
+>>> git add .
+>>> git commit -m 'initial import'
 >>> git branch easydata    # tag for future easydata upgrades
 
 NOTE: By default, raw data is installed and unpacked in the
 
@@ -27,11 +27,9 @@ unfinished:
 #
 
 .PHONY: data
-## convert raw datasets into fully processed datasets
 data: transform_data
 
 .PHONY: sources
-## Fetch, Unpack, and Process raw DataSources
 sources: process_sources
 
 .PHONY: fetch_sources
@@ -56,7 +54,6 @@ process_sources: .make.process_sources
 	touch .make.process_sources
 
 .PHONY: transform_data
-## Apply Transformations to produce fully processed Datsets
 transform_data: .make.transform_data
 
 .make.transform_data: .make.process_sources
@@ -71,17 +68,14 @@ clean:
 	rm -f .make.*
 
 .PHONY: clean_interim
-## Delete all interim (DataSource) files
 clean_interim:
 	rm -rf data/interim/*
 
 .PHONY: clean_raw
-## Delete the raw downloads directory
 clean_raw:
 	rm -f data/raw/*
 
 .PHONY: clean_processed
-## Delete all processed datasets
 clean_processed:
 	rm -f data/processed/*
 
@@ -103,7 +97,7 @@ lint:
 	flake8 $(MODULE_NAME)
 
 .PHONY: debug
-## Give a report on current status
+## dump useful debugging information to $(DEBUG_FILE)
 debug:
 	@echo "\n\n======================"
 	@echo "\nPlease include the contents $(DEBUG_FILE) when submitting an issue or support request.\n"
@@ -155,7 +149,7 @@ debug:
 
 print-%  : ; @echo $* = $($*)
 
-HELP_VARS := PROJECT_NAME
+HELP_VARS := PROJECT_NAME DEBUG_FILE
 
 help-prefix:
 	@echo "To get started:"
 
@@ -14,17 +14,14 @@ else
 endif
 
 .PHONY: create_environment
-## Set up virtual environment for this project
+## Set up virtual (conda) environment for this project
 create_environment: environment.$(ARCH).lock.yml
 ifeq (conda,$(VIRTUALENV))
-	$(CONDA_EXE) env update -n $(PROJECT_NAME) -f environment.$(ARCH).lock.yml
+	@touch environment.yml
+	@echo
 	@echo "New conda env created. Activate with:"
 	@echo ">>> conda activate $(PROJECT_NAME)"
-	@echo
-	@echo "Now would be a good time to initialize a git repo; i.e."
-	@echo ">>> git init"
-	@echo ">>> git add ."
-	@echo ">>> git commit -m 'initial import'"
+	@echo ">>> make update_environment"
 ifneq ("X$(wildcard .post-create-environment.txt)","X")
 	@cat .post-create-environment.txt
 endif
@@ -33,11 +30,12 @@ else
 endif
 
 .PHONY: delete_environment
-## Delete the virtual environment for this project
+## Delete the virtual (conda) environment for this project
 delete_environment:
 ifeq (conda,$(VIRTUALENV))
 	@echo "Deleting conda environment."
 	$(CONDA_EXE) env remove -n $(PROJECT_NAME)
+	rm environment.$(ARCH).lock.yml
 ifneq ("X$(wildcard .post-delete-environment.txt)","X")
 	@cat .post-delete-environment.txt
 endif
@@ -46,17 +44,16 @@ else
 endif
 
 .PHONY: update_environment
-## Install or update Python Dependencies
+## Install or update Python Dependencies in the virtual (conda) environment
 update_environment: test_environment environment.$(ARCH).lock.yml
 ifneq ("X$(wildcard .post-update-environment.txt)","X")
 	@cat .post-update-environment.txt
 endif
 
 .PHONY: test_environment
-## Test python environment is set-up correctly
 test_environment:
 ifeq (conda,$(VIRTUALENV))
-ifneq (${CONDA_DEFAULT_ENV}, $(PROJECT_NAME))
+ifneq ($(notdir ${CONDA_DEFAULT_ENV}), $(PROJECT_NAME))
 	$(error Must activate `$(PROJECT_NAME)` environment before proceeding)
 endif
 else
 
@@ -1,6 +1,6 @@
 {{cookiecutter.project_name}}
 ==============================
-_Author: {{ cookiecutter.author_name }}
+_Author: {{ cookiecutter.author_name }}_
 
 {{cookiecutter.description}}
 
@@ -20,18 +20,16 @@ REQUIREMENTS
 
 GETTING STARTED
 ---------------
-### Checking out the repo
-Note: These instructions assume you are using SSH keys (and not HTTPS authentication) with github.
-If you haven't set up SSH access to GitHub, see [Configuring SSH Access to Github](https://github.com/hackalog/cookiecutter-easydata/wiki/Configuring-SSH-Access-to-Github). This also includes instuctions for using more than one account with SSH keys.
-
-1. Fork the repo (on GitHub) to your personal account
-1. Clone your fork to your local machine
-  `git clone git@github.com:<your github handle>/{{cookiecutter.project_name}}.git`
-1. Add the main source repo as a remote branch called `upstream` (to make syncing easier):
-  `cd {{cookiecutter.project_name}}`
-  `git remote add upstream git@github.com:<upstream-repo>/{{cookiecutter.project_name}}.git`
-
-You're all set for staying up-to-date with the project repo. Follow the instructions in this handy [Github Workflow Cheat Sheet](https://github.com/hackalog/cookiecutter-easydata/wiki/Github-Workflow-Cheat-Sheet) for keeping your working copy of the repo in sync.
+### Git Configuration and Checking Out the Repo
+
+If you haven't yet done so, please follow the instrucitons
+in [Setting up git and Checking Out the Repo](framework-docs/git-configuration.md) in
+order to check-out the code and set-up your remote branches
+
+Note: These instructions assume you are using SSH keys (and not HTTPS authentication) with {{ cookiecutter.upstream_location }}.
+If you haven't set up SSH access to {{ cookiecutter.upstream_location }}, see [Configuring SSH Access to {{cookiecutter.upstream_location}}](https://github.com/hackalog/cookiecutter-easydata/wiki/Configuring-SSH-Access-to-Github). This also includes instuctions for using more than one account with SSH keys.
+
+Once you've got your local, `origin`, and `upstream` branches configured, you can follow the instructions in this handy [Git Workflow Cheat Sheet](framework-docs/git-workflow.md) to keep your working copy of the repo in sync with the others.
 
 ### Setting up your environment
 **WARNING**: If you have conda-forge listed as a channel in your `.condarc` (or any other channels other than defaults), remove it during the course of the workshop. Even better, don't use a `.condarc` for managing channels, as it overrides the `environment.yml` instructions and makes things less reproducible. Make the changes to the `environment.yml` file if necessary. We've had some conda-forge related issues with version conflicts. We also recommend [setting your channel priority to 'strict'](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-channels.html) to reduce package incompatibility problems.
 
@@ -34,4 +34,5 @@ dependencies:
   - pandas
   - requests
   - pathlib
+  - fsspec
 {{ pyver()|indent(2, true) }}
@@ -1,40 +1,64 @@
 # Setting up and Maintaining your Conda Environment (Reproducibly)
 
-The `{{ cookiecutter.repo_name }}` repo is set up with template code to make managing your conda environments easy and reproducible. Not only will future you appreciate this, but everyone else who tries to run your code will thank you.
+The `{{ cookiecutter.repo_name }}` repo is set up with template code to make managing your conda environments easy and reproducible. Not only will _future you_ appreciate this, but so will anyone else who needs to work with your code after today.
 
-If you haven't yet, get your initial environment set up.
+If you haven't yet, configure your conda environment.
 
-### Quickstart Instructions
-**WARNING FOR EXISTING CONDA USERS**: If you have conda-forge listed as a channel in your `.condarc` (or any other channels other than defaults), remove it during the course of the project. Even better, don't use a `.condarc` for managing channels, as it overrides the `environment.yml` instructions and makes things less reproducible. Make the changes to the `environment.yml` file if necessary. We've had some conda-forge related issues with version conflicts.
+## Configuring your python environment
+Easydata uses conda to manage python packages installed by both conda **and pip**.
 
-We also recommend [setting your channel priority to 'strict'](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-channels.html) to reduce package incompatibility problems. This will be default in future conda releases, but it is being rolled out gently.
+### Adjust your `.condarc`
+**WARNING FOR EXISTING CONDA USERS**: If you have `conda-forge` listed as a channel in your `.condarc` (or any other channels other than `default`), **remove them**. These channels should be specified in `environment.yml` instead.
 
+We also recommend [setting your channel priority to 'strict'](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-channels.html) to reduce package incompatibility problems. This will be the default in conda 5.0, but in order to assure reproducibility, we need to use this behavior now.
+
+```
+conda config --set channel_priority strict
+```
+Whenever possible, re-order your channels so that `default` is first.
+
+```
+conda config --prepend channels defaults
+```
+
+**Note for Jupyterhub Users**: You will need to store your conda environment in your **home directory** so that they wil be persisted across JupyterHub sessions.
+```
+conda config --prepend envs_dirs ~/.conda/envs   # Store environments in local dir for JupyterHub
+```
+
+### Fix the CONDA_EXE path
 * Make note of the path to your conda binary:
 ```
    $ which conda
    ~/miniconda3/bin/conda
 ```
-* ensure your `CONDA_EXE` environment variable is set to this value (or edit `Makefile.include` directly)
+* ensure your `CONDA_EXE` environment variable is set correctly in `Makefile.include`
 ```
     export CONDA_EXE=~/miniconda3/bin/conda
 ```
+### Create the conda environment
 * Create and switch to the virtual environment:
 ```
 cd {{ cookiecutter.repo_name }}
 make create_environment
 conda activate {{ cookiecutter.repo_name }}
 make update_environment
 ```
-Note: you need to run `make update_environment` for the `{{ cookiecutter.module_name }}` module to install correctly.
+**Note**: When creating the environment the first time, you really do need to run **both** `make create_environment` and `make update_environment` for the `{{ cookiecutter.module_name }}` module to install correctly.
+
+To activate the environment, simply `conda activate {{ cookiecutter.repo_name }}`
+
+To deactivate it and return to your base environment, use `conda deactivate`
+
+## Maintaining your Python environment
 
-From here on, to use the environment, simply `conda activate {{ cookiecutter.repo_name }}` and `conda deactivate` to go back to the base environment.
+### Updating your conda and pip environments
+The `make` commands, `make create_environment` and `make update_environment` are wrappers that allow you to easily manage your conda and pip environments using the `environment.yml` file.
 
-### Further Instructions
+(If you ever forget which `make` command to run, you can run `make` by itself and it will provide a list of commands that are available.)
 
-#### Updating your environment
-The `make` commands, `make create_environment` and `make update_environment` are wrappers that allow you to easily manage your environment using the `environment.yml` file. If you want to make changes to your environment, do so by editing the `environment.yml` file and then running `make update_environment`.
 
-If you ever forget which make command to run, you can run `make` and it will list a magic menu of which make commands are available.
+When adding packages to your python environment, **do not `pip install` or `conda install` directly**. Always edit `environment.yml` and `make update_environment` instead.
 
 Your `environment.yml` file will look something like this:
 ```
@@ -64,12 +88,12 @@ name: {{ cookiecutter.repo_name }}
 ```
 To add any package available from conda, add it to the end of the list. If you have a PYPI dependency that's not avaible via conda, add it to the list of pip installable dependencies under `  - pip:`.
 
-You can include any GitHub python-based project in the `pip` section via `git+https://github.com/<github handle>/<package>`.
+You can include any {{ cookiecutter.upstream_location }} python-based project in the `pip` section via `git+https://{{ cookiecutter.upstream_location }}/<my_git_handle>/<package>`.
 
-In particular, if you're working off of a fork or a work in progress branch of a repo in GitHub (say, your personal version of <package>), you can change `git+https://github.com/<github handle>/<package>` to
+In particular, if you're working off of a fork or a work in progress branch of a repo in {{ cookiecutter.upstream_location }} (say, your personal version of <package>), you can change `git+https://{{ cookiecutter.upstream_location }}/<my_git_handle>/<package>` to
 
-* `git+https://github.com/<my github handle>/<package>.git` to point to the master branch of your fork and
-* `git+https://github.com/<my github handle>/<package>.git@<my branch>` to point to a specific branch.
+* `git+https://{{ cookiecutter.upstream_location }}/<my_git_handle>/<package>.git` to point to the {{cookiecutter.default_branch}} branch of your fork and
+* `git+https://{{ cookiecutter.upstream_location }}/<my_git_handle>/<package>.git@<my branch>` to point to a specific branch.
 
 Once you're done your edits, run `make update_environment` and voila, you're updated.
Original file line number	Diff line number	Diff line change
`@@ -1,10 +1,12 @@`
`1`	`1`	`{`
`2`	`2`	`"project_name": "project_name",`
`3`	`3`	`"repo_name": "{{ cookiecutter.project_name.lower().replace(' ', '_') }}",`
	`4`	`+ "default_branch": ["master", "main"],`
`4`	`5`	`"module_name": "src",`
`5`	`6`	`"author_name": "Your name (or your organization/company/team)",`
`6`	`7`	`"description": "A short description of this project.",`
`7`	`8`	`"open_source_license": ["MIT", "BSD-2-Clause", "Proprietary"],`
`8`	`9`	`"python_version": ["3.7", "3.6", "latest", "3.8"],`
`9`		`- "conda_path": "~/anaconda3/bin/conda"`
	`10`	`+ "conda_path": "~/anaconda3/bin/conda",`
	`11`	`+ "upstream_location": ["github.com", "gitlab.com", "bitbucket.org", "your-custom-repo"]`
`10`	`12`	`}`