/ai-project-template

Project layout for efficient AI for science

Primary LanguageJupyter Notebook

AI Project Template

A template for organizing projects that use AI to solve scientific problems. The organization is designed with a few goals in mind:

  1. Reproducibility: Ensure that all work towards a project is easy to capture.
  2. Development Best Practices: Make it easy to develop complex code using established tools and practices
  3. Teamwork: Create good entry points for others to join your project
  4. Publication: Minimal effort should be required to archive the project
  5. Efficiency: None of the above goals should interfere with executing the science

This repository is an accompaniment to a "How to Be Reproducible without Trying" presentation.

How to use this repository

Easy ways to get started are either to create your own GitHub repository then mirror the layout here as you progress, or to download the repository then delete or edit files to make it your own.

We offer a few routes to understanding how to use this structure:

  1. A step-by-step walkthrough
  2. An overview of the project layout
  3. A dive into the rationale behind it

What do I expect you know already?

This guide is targeted at people who already have some exposure to scientific computing and, in some places, machine learning. Specifically, you should be familiar with:

  1. Using the command line. If you don't, check out DjangoGirls' tutorial. Also, prepared to experience a command-line text editor like vim or nano
  2. Python basics. Know how to make a script and execute it from the command line
  3. Jupyter. I think Jupyter is a fantastic tool, if used effectively. Jupyter's website does a good job of explaining what you can do with them.
  4. Pandas. A great tool for manipulating tabular data. Among my favorite Python libraries, and one you can learn [in 10 minutes].

We are going to teach you the basics of:

  1. Version control, and how it helps you organize.
  2. Python environments, and how they are necessary for reproducibility
  3. CLI scripts, and how to use them to record experiments
  4. Data publication, because publishing papers is not enough.