/data_buddy

Hooks for setting up data-analysis projects

Primary LanguagePython

data_buddy

This repository contains some scripts I use when running a data-analysis project.

My data analysis projects contain

  • details of a project-specific conda environment that should be created and activated before running anything

  • some non-conda packages and scripts that I use in multiple projects. These are either

    • copied in from local directories / files (in which case the current project should keep them under version control); or

    • (preferably) cloned from a github or bitbucket repository. In the latter case, an explicit package version is included by specifying a git-commit SHA and branch (in which case the current project does not keep the included package under version control).

  • packages & scripts that are developed specifically for the current project

  • a Snakefile for controlling the running of the project scripts

  • links to data

  • subjobs (which are nested copies of the project structure, but which are version-controlled and environment-defined within the main project)


Since data_buddy will progressively change, it should be copied into any new project (for the moment at least).

All config files for use in data_buddy should be stored in ./.sidekick/setup


To run ./sidekick setup your environment should contain:

sh
pyyaml
# and for R-based projects
r-base
r-desc
r-devtools