This repository contains materials for a workshop on Exploratory Data Analysis and Predictive Modeling hosted at CODE University, Berlin, Germany.
In order to re-run the workshop materials we encourage you to use the conda package manager. Once installed, create an environment and install all required dependencies on your machine by typing
conda env create -f environment.yml
into your console. You activate your new environment by typing
source activate data-science
(on LINUX and Mac) or
activate data-science
(on WINDOWS).
Then you are ready to go (if you are stuck check out the conda documentation site). Alternatively, you may launch binder to get a reproducible executable environment immediately in your browser. Simply click the launch binder icon in the upper left corner, or go here.
The workshop focuses on data science with Python. We will introduce libraries/modules such as numpy
, scipy
, statsmodels
, pandas
, scikit-learn
, matplotlib
, and seaborn
, among others.
All data sets, all code snippets, all Jupyter notebooks and the environment.yml
file for reproducibility are available through this self contained repository.
The structure of this repository is outlined below:
data-science
│.git # git internals
│.gitignore # specify files/folders to be ignored by git
└───datasets
│ │... # find all the raw data files
└───figures
│ │... # saved figures go here
└───notebooks
│ └───_img
│ │ │... # rendered images are placed here
│ │... # find all Jupyter notebooks here
│
│README.md
│LICENSE
│environment.yml # conda environment specifications for reproducibility
└───src
│... # here go the code snippets and scripts
└───_solutions
│... # solutions for coding challenges (don't cheat yourself ;-))