Repository for the final project of the AI for Actuarial Science course (2nd Semester, final year, ENSAE Paris).
- The code used to train the machine learning models can be found in the
src/models/
folder (the parameters used for the grid search are stored in theconf
folder, and themain.py
script orchestrates it all). - The code used to train the deep learning model can be found in the
Neural_Network.ipynb
notebook. - Then, on the one hand,
📚_Presentation.py
andpages/
contain the user interface code for the three pages of ourStreamlit
application. On the other,src/app/
contains the code for our application's backend (as well as some useful frontend components).static/
folder contains some content of the app and thecss
styles. - Finally, in order to deploy the app, we built a
Docker
image (with entrypoint being therun.sh
script). We automated the image delivery thanks to some configuration stuff (deployment/
andargocd
folders), hence a new image is being pushed to theDockerHub
at every new version of the app.
NB1: The data comes from a public Kaggle Repository, and can also be directly downloaded from this site. We have also added the data to a S3 bucket, accessible to the SSP Cloud's solution (MinIO
). Hence, in our code, we directly use the data that is stored in our bucket.
NB2: When our application runs, many static elements are displayed. However, we also load the machine learning models we've trained (we've trained 6 models in total). The trained models are stored in the same S3 bucket as our data, and when running the application, we store them in cache memory to avoid reloading / optimize code efficiency.
From the command line, you will have to follow the following steps to set this project up:
- Clone this repository:
git clone https://github.com/JulesBrable/ai_insurance.git
- Go to the project folder:
cd ai_insurance
- Create and activate conda environnement:
conda create -n ai_insurance python=3.9 -y
conda acitvate ai_insurance
- Install the listed dependencies:
pip install -r requirements.txt
To train the model, you can run the following commands:
cd src
python main.py
Note that main.py
can take multiple arguments : --methods
and --model
. See the script for more information about the values that can be entered.
By default, we are training a Random Forest or a Logistic Regression, by GridSearchCV & StratifiedKFold (K=5) cross-validation. you can change the parameters of the grid in the conf/params.yaml
file.
In this project, we also built a simple Streamlit
web app.
To access the app, one can simply click here. Indeed, the app is deployed on a Kubernetes
cluster hosted by SSP Cloud.
On the other hand, you can also run this app locally. To do so, after following the set-up instruction described above, you will have to run the following command:
streamlit run 📚_Presentation.py --server.port=5010
By default, we are using port 5010, so once you have run the last command, you will be able to access the app with the following link: http://localhost:5010/.