/wine_exploration_and_clustering

Unveiling the stories within Italian wines

Primary LanguageJupyter Notebook

Wine exploration and clustering: Unveiling the stories within Italian wines

This project aims to explore and cluster Italian wines based on their characteristics using a clustering algorithm KMeans and data exploration within the notebooks folder in the main.ipynb.

Project Structure

  • data/: This directory contains all the data used in this project. It is divided into four subdirectories:

    • external/: Any external data sources.
    • interim/: Intermediate data that has been transformed.
    • processed/: The final, canonical data sets for modeling.
    • raw/: The original, immutable data dump.
  • models/: This directory contains the trained and serialized models, model predictions, or model summaries.

  • notebooks/: This directory contains Jupyter notebooks for exploration and testing. The main notebook is main.ipynb.

  • reports/: This directory contains generated analysis as HTML, PDF, LaTeX, etc. It also includes any figures generated by the notebooks. Go to this directory to see the full ProfileReport made in the main.ipynb notebook containing important information about the data distribution.

Getting Started

To get started with this project, you need to have Python 3.11 installed on your machine. You can then install the required packages using the following command:

pip install -r requirements.txt

Usage

You can run the main notebook (main.ipynb) to see the exploration and clustering process.

Setting Docker

  1. Open your terminal.
  2. Navigate to the pipe directory where the Dockerfile is located.
  3. Build the Docker image by running the following command:
docker build -t image-name .
  1. Run the Docker container with the following command:
docker run -p 8000:8000 image-name

In the above commands, replace image-name with the name you want to give to your Docker image.

Accessing the data through the API

The data extraction API, data exploration and clustering analysis are implemented in data_analysis_and_model.py located in the pipe/src directory. To access the data, make a GET request to the following endpoint:

http://localhost:8000/data

Here, the data is retrieved from the original URL: Original dataset

Accesing the data exploration

The exploration of the data retrieved from the API in the same code mentioned before will be in the next url:

http://localhost:8000/data-exploration

Accesing the clustering model implementation

The implementation of clustering model in the data and the important information about it will be in the next url:

http://localhost:8000/clustering-analysis

Dependencies

The dependencies for the scripts are listed in requirements_scripts.txt located in the pipe directory. You can install them with pip:

pip install -r requirements_scripts.txt

Additional Information

For more detailed information about the project, graphs and all the analysis made, refer to the main.ipynb notebook located in the notebooks directory.

Please replace image-name with the name you want to give to your Docker image. Also, make sure to update the API endpoint and the pip command with the correct information based on your project's setup.