/descriptive-stats-using-ci

Descriptive Statistics with Continuous Integration.

Primary LanguageJupyter NotebookCreative Commons Zero v1.0 UniversalCC0-1.0

python-data-science-template-v2

Codespaces Prebuilds

Install

Format

Lint

Test


This project demonstrates the benifits of using Continuous Integration for Data Science projects. The CI pipeline ensures that code that is pushed to the main branch upholds code quality both in term of functionality and formatting. It also provdes a hosted environment which is equivalent to the local.

Code

The repo has the following code files:

  • main.py: Generates the descriptive statistics and plots, and stores them into the /output folder.
  • lib.py: Holds the library funtions that are shared between main.py and desc_stats.ipynb.
  • desc_stats.py: Prints out the descriptive stats and plots using the functions in lib.py.
  • test_lib.py: Testcases for the functions in lib.py.
  • test_main.py: Testcases for the functions in main.py.

Build

Run the following commands to setup the environment/run the code.

make install
make lint
make format
make test
make run

make run executes the main.py and stores the results in outputs directory.

Results

Distribution of Victim Ages

Descriptive statistics can be found here.

Demo

The demo for this project can be found here.