This submission consists of the following sections:
The following shows the folder structure of the repository.
<base>
├── .env # Required to use for mapbox api in Jupyter
├── dyson.ipynb # Jupyter Notebook
├── conda-env.yaml # conda env file for Jupyter Notebook
├── readme.md
├── data
│ └── cali_dyson_households.csv # Provided data file
├── docker
│ ├── ces.DockerFile # dockerfile
│ ├── docker-compose.yml # docker-compose file
│ └── requirements.txt # env file for dockerfile
├── models
│ └── lin_reg_pipe.pkl # pickled file for use in frontend
└── src
└── app.py # frontend script
Unless otherwise stated please use run the following commands on the command line at <base>
. Please see folder structure for more information.
1.1 To run the other sections please use the conda-env.yaml file, using the command:
conda env create -f conda-env.yaml
conda activate ces
2.1 To run streamlit via docker, please use the command:
docker compose -f docker/docker-compose.yml up -d
2.2 Once the image has been sucessfully created you can access the frontend on your browser at localhost:6006
2.3 To stop the container, please use the command:
docker-compose -f docker/docker-compose.yml down
To help better understand the overall flow of the ML pipeline please view the following to understand the Machine Learning process flow.
graph TD
A[Preprocessing Pipeline] -- RandomSearchCV --> B(Histogram Gradient \n Boosting Regressor )
A -- GridSearchCV --> C(Linear Regression)
B -- Best_params--> D{Prediction \n on Test set}
C -- Best_params--> D -- Final model choice --> E(Model in Production)