This project aims to explore and cluster Italian wines based on their characteristics using a clustering algorithm KMeans and data exploration within the notebooks folder in the main.ipynb
.
-
data/
: This directory contains all the data used in this project. It is divided into four subdirectories:external/
: Any external data sources.interim/
: Intermediate data that has been transformed.processed/
: The final, canonical data sets for modeling.raw/
: The original, immutable data dump.
-
models/
: This directory contains the trained and serialized models, model predictions, or model summaries. -
notebooks/
: This directory contains Jupyter notebooks for exploration and testing. The main notebook ismain.ipynb
. -
reports/
: This directory contains generated analysis as HTML, PDF, LaTeX, etc. It also includes any figures generated by the notebooks. Go to this directory to see the full ProfileReport made in themain.ipynb
notebook containing important information about the data distribution.
To get started with this project, you need to have Python 3.11 installed on your machine. You can then install the required packages using the following command:
pip install -r requirements.txt
You can run the main notebook (main.ipynb
) to see the exploration and clustering process.
- Open your terminal.
- Navigate to the
pipe
directory where the Dockerfile is located. - Build the Docker image by running the following command:
docker build -t image-name .
- Run the Docker container with the following command:
docker run -p 8000:8000 image-name
In the above commands, replace image-name
with the name you want to give to your Docker image.
The data extraction API, data exploration and clustering analysis are implemented in data_analysis_and_model.py
located in the pipe/src
directory. To access the data, make a GET request to the following endpoint:
http://localhost:8000/data
Here, the data is retrieved from the original URL: Original dataset
The exploration of the data retrieved from the API in the same code mentioned before will be in the next url:
http://localhost:8000/data-exploration
The implementation of clustering model in the data and the important information about it will be in the next url:
http://localhost:8000/clustering-analysis
The dependencies for the scripts are listed in requirements_scripts.txt
located in the pipe
directory. You can install them with pip:
pip install -r requirements_scripts.txt
For more detailed information about the project, graphs and all the analysis made, refer to the main.ipynb
notebook located in the notebooks
directory.
Please replace image-name
with the name you want to give to your Docker image. Also, make sure to update the API endpoint and the pip command with the correct information based on your project's setup.