/template

Template repository for scaffolding data science projects with a sensible configuration.

Primary LanguagePythonMIT LicenseMIT

gh-template pre-commit

Getting Started

Most of the dependencies for the template are written to environment.yml in the bootstrap script. There are a few things that you need to install and setup to be able to do that however.

  1. Install Anaconda (or miniconda) if you don't have it already (and make sure to conda init).
  2. Install the Git Large File Storage extension.

Once you have both Anaconda and git-lfs simply run:

python bootstrap.py
conda activate [env-name] # <-- this defaults to the GH user/repo name during bootstrap.py

Structure

This is simply a recommended starting scaffold. None of the tooling with this template requires any specific naming/folder conventions outside of the standard git/Github conventions (like .gitignore and the .github folder) and Python/conda requirements (like setup.cfg and environment.yml). Feel free to rearrange this however is optimal for your project.

├── LICENSE
├── README.md
├── data
│
├── models             <- Serialized trained models and model artifacts
│
├── notebooks          <- Jupyter notebooks
│
├── reports            <- Any output/presentation artifacts (like HTML, PDF, LaTeX, etc.)
│
├── environment.yml    <- `conda` environment file to configure package dependencies
│                           (created by `bootstrap.py`)
│
├── setup.cfg          <- makes `src/` pip installable so classes/modules can be imported
│                           (created by `bootstrap.py`)
│
├── src                <- Source code meant to be imported as modules in notebooks or scripts
│  
├── scripts            <- Python files intended to be run from the command line (and not imported)
│  
└── tests              <- (optional)

How do I?

Use a specific version of Python

Update the Python dependency in environment.yml and update the environment: conda env update --file environment.yml --prune

name: tribe
dependencies:
    - python=3.9 # <------ specify Python version
    - ipython
    - jupyter
    - pip:
          - pre-commit

Update package requirements

Same as above.

Work with large data files

Git will warn you of any file added that is larger than 50mb and Github blocks pushes with any file larger than 100mb.

This template repository is already configured to use Git Large File Storage for common data file formats in .gitattributes however. If you want to either commit one of these file types normally (not using LFS) or add additional formats for LFS, simply add add/remove from this file.

Remove a large file I accidentally comitted

You won't. We use a pre-commit hook that checks for large files to make sure that you never add them in the first place.

Resources

LICENSE

MIT License

Copyright (c) 2022 Tribe AI

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.