This project demonstrates the benifits of using Continuous Integration for Data Science projects. The CI pipeline ensures that code that is pushed to the main branch upholds code quality both in term of functionality and formatting. It also provdes a hosted environment which is equivalent to the local.
The repo has the following code files:
main.py
: Generates the descriptive statistics and plots, and stores them into the/output
folder.lib.py
: Holds the library funtions that are shared betweenmain.py
anddesc_stats.ipynb
.desc_stats.py
: Prints out the descriptive stats and plots using the functions inlib.py
.test_lib.py
: Testcases for the functions inlib.py
.test_main.py
: Testcases for the functions inmain.py
.
Run the following commands to setup the environment/run the code.
make install
make lint
make format
make test
make run
make run
executes the main.py
and stores the results in outputs
directory.
Descriptive statistics can be found here.
The demo for this project can be found here.