Tidepool Data Science Project Template

Creating a new repository from this template

Manually create a new repo from this template in github; github directions are here.

New Repo Checklist & Instructions

[Project Name]

-- Project Status: [Active, On-Hold, Completed]

-- Project Disclaimer: This work is for [Exploration, Development, Production]

Project Objective

The purpose of this project is to [___].

Definition of Done

This phase of the project will be done when [___].

Project Description

(Add a short paragraph with some details, Why?, How?, Link to Jira and/or Confluence) In order to learn/do [], we did [].

Technologies (Update this list)

Python (99% of the time)
Anaconda for our virtual environments
Pandas for working with data (99% of the time)
Google Colab for sharing examples
Plotly for visualization
Pytest for testing
Travis for continuous integration testing
Black for code style
Flake8 for linting
Sphinx for documentation
Numpy docstring format
pre-commit for githooks

Getting Started with the Conda Virtual Environment

Install Miniconda. CAUTION for python virtual env users: Anaconda will automatically update your .bash_profile so that conda is launched automatically when you open a terminal. You can deactivate with the command conda deactivate or you can edit your bash_profile.
If you are new to Anaconda check out their getting started docs.
If you want the pre-commit githooks to install automatically, then following these directions.
Clone this repo (for help see this tutorial).
In a terminal, navigate to the directory where you cloned this repo.
Run conda update -n base -c defaults conda to update to the latest version of conda
Run conda env create -f conda-environment.yml --name [input-your-env-name-here]. This will download all of the package dependencies and install them in a conda (python) virtual environment. (Insert your conda env name in the brackets. Do not include the brackets)
Run conda env list to get a list of conda environments and select the environment that was created from the environmental.yml file (hint: environment name is at the top of the file)
Run conda activate <conda-env-name> or source activate <conda-env-name> to start the environment.
If you did not setup your global git-template to automatically install the pre-commit githooks, then run pre-commit install to enable the githooks.
Run deactivate to stop the environment.

Maintaining Compatability with venv and virtualenv

This may seem counterintuitive, but when you are loading new packages into your conda virtual environment, load them in using pip, and export your environment using pip-chill > requirements.txt. We take this approach to make our code compatible with people that prefer to use venv or virtualenv. This may also make it easier to convert existing packages into pypi packages. We only install packages directly in conda using the conda-environment.yml file when packages are not available via pip (e.g., R and plotly-orca).

Getting Started with this project

Raw Data is being kept [here](Repo folder containing raw data) within this repo. (If using offline data mention that and how they may obtain the data from the froup)
Data processing/transformation scripts are being kept [here](Repo folder containing data processing scripts/notebooks)
(Finishing filling out this list)

Contributing Guide

All are welcome to contribute to this project.
Naming convention for notebooks is [short_description]-[initials]-[date_created]-[version], e.g. initial_data_exploration-jqp-2020-04-25-v-0-1-0.ipynb. A short _ delimited description, the creator's initials, date of creation, and a version number,
Naming convention for data files, figures, and tables is [PHI (if applicable)]-[short_description]-[date created or downloaded]-[code_version], e.g. raw_project_data_from_mnist-2020-04-25-v-0-1-0.csv, or project_data_figure-2020-04-25-v-0-1-0.png.

NOTE: PHI data is never stored in github and the .gitignore file includes this requirement as well.

Featured Notebooks/Analysis/Deliverables

Colab Notebook/Figures/Website

Tidepool Data Science Team

Name (with github link)	Tidepool Slack
Ed Nykaza	@ed
Jason Meno	@jason
Cameron Summers	@Cameron Summers

Known TODO items

automate the process of finding all of the the TODO: comments in the code and put link here.

tidepool-org/data-science--explore--new-data-pipeline