/Sales_Conversion_Optimization_MLOps_Project

Sales Conversion Optimization MLOps: Boost revenue with AI-powered insights. Features H2O AutoML, ZenML pipelines, Neptune.ai tracking, data validation, drift analysis, CI/CD, Streamlit app, Docker, and GitHub Actions. Includes e-mail alerts, Discord/Slack integration, and SHAP interpretability. Streamline ML workflow and enhance sales performance.

Primary LanguageHTML

Sales Conversion Optimization Project ๐Ÿ“ˆ

Table of Contents ๐Ÿ“‘

  1. Project Description ๐Ÿ“
  2. Project Structure ๐Ÿ—๏ธ
  3. Necessary Installations ๐Ÿ› ๏ธ
  4. Train Pipeline ๐Ÿš‚
  5. Continuous Integration Pipeline ๐Ÿ”
  6. Alert Reports ๐Ÿ“ง
  7. Prediction App ๐ŸŽฏ
  8. Neptune.ai Dashboard ๐ŸŒŠ
  9. Docker Configuration ๐Ÿณ
  10. GitHub Actions ๐Ÿ› ๏ธ
  11. Running the Project ๐Ÿš€

Project Description ๐Ÿš€

Welcome to the Sales Conversion Optimization Project! ๐Ÿ“ˆ This project focuses on enhancing sales conversion rates through meticulous data handling and efficient model training. The goal is to optimize conversions using a structured pipeline and predictive modeling.

We've structured this project to streamline the process from data ingestion and cleaning to model training and evaluation. With an aim to empower efficient decision-making, our pipelines incorporate quality validation tests, drift analysis, and rigorous model performance evaluations.

This project aims to streamline your sales conversion process, providing insights and predictions to drive impactful business decisions! ๐Ÿ“Šโœจ

Project Structure ๐Ÿ—๏ธ

Let's dive into the project structure! ๐Ÿ“ Here's a breakdown of the directory:

  • steps Folder ๐Ÿ“‚

    • ingest_data
    • clean_data
    • train_model
    • evaluation
    • production_batch_data
    • predict_prod_data
  • src Folder ๐Ÿ“

    • clean_data
    • train_models
  • pipelines Folder ๐Ÿ“‚

    • training_pipeline
    • ci_cd_pipeline
  • models Folder ๐Ÿ“

    • saved best H20.AI model
  • reports Folder ๐Ÿ“‚

    • Failed HTML Evidently.ai reports
  • production data Folder ๐Ÿ“

    • Production batch dataset
  • Other Files ๐Ÿ“„

    • requirements.txt
    • run_pipeline.py
    • ci-cd.py

This organized structure ensures a clear separation of concerns and smooth pipeline execution. ๐Ÿš€

Necessary Installations ๐Ÿ› ๏ธ

To ensure the smooth functioning of this project, several installations are required:

  1. Clone this repository to your local machine.

    git clone https://github.com/VishalKumar-S/Sales_Conversion_Optimization_MLOps_Project
    cd Sales_Conversion_Optimization_MLOps_Project
  2. Install the necessary Python packages.

    pip install -r requirements.txt
  3. ZenML Integration

    pip install zenml["server"]
    zenml init      #to initialise the ZeenML repository
    zenml up    
  4. Neptune Integration

    zenml experiment-tracker register neptune_experiment_tracker --flavor=neptune \
    --project="$NEPTUNE_PROJECT" --api_token="$NEPTUNE_API_TOKEN"
    
    zenml stack register neptune_stack \
    -a default \
    -o default \
    -e neptune_experiment_tracker \
    --set

Make sure to install these dependencies to execute the project functionalities seamlessly! ๐ŸŒŸ Neptune.ai integration with ZenML

Train Pipeline ๐Ÿš‚

In this pipeline, we embark on a journey through various steps to train our models! ๐Ÿ›ค๏ธ Here's the process breakdown:

  1. run_pipeline.py: Initiates the training pipeline.
  2. steps/ingest_Data: Ingests the data, sending it to the data_validation step.
  3. data_validation step: Conducts validation tests and transforms values.
  4. steps/clean_Data: Carries out data preprocessing logics.
  5. data_Drift_validation step: Conducts data drift tests.
  6. steps/train_model.py: Utilizes h2o.ai AUTOML for model selection.
  7. src/train_models.py: Implements the best model on the cleaned dataset.
  8. model_performance_Evaluation.py: Assesses model performance on a split dataset.
  9. steps/alert_report.py: Here, if any of teh validation test suites, didn't meet the threshold condition, email will be sent to the user, along with the failed Evidently.AI generated HTML reports.

Each step is crucial in refining and validating our model. All aboard the train pipeline! ๐ŸŒŸ๐Ÿš†

Training Pipeline

Continuous Integration Pipeline โš™๏ธ

The continuous integration pipeline focuses on the production environment and streamlined processes for deployment. ๐Ÿ”„

Here's how it flows:

  1. ci-cd.py: Triggered to initiate the CI/CD pipeline.
  2. steps/production_batch_data: Accesses production batch data from the Production_data folder
  3. pipelines/ci_cd_pipeline.py: As we already discussed earlier, we conduct Data Quality, Data Drift as previously we did, if threshold fails, email reports are sent.
  4. steps/predict_production_Data.py: Utilizes the pre-trained best model to make predictions on new production data. Then, we conduct Model Performance validation as previously we did, if threshold fails, email reports are sent.

This pipeline is crucial for maintaining a continuous and reliable deployment process. ๐Ÿ”โœจ

Continuous Integration Pipeline Part-1 Continuous Integration Pipeline Part-2

Alert Reports ๐Ÿ“ง

In our project, email reports are a vital part of the pipeline to notify users when certain tests fail. These reports are triggered by specific conditions during the pipeline execution. Here's how it works:

E-mail Details

Upon data quality or data drift test or model performance validation tests failures, an email is generated detailing:

  • Number of total tests performed.
  • Number of passed and failed tests.
  • Failed test reports attached in HTML format.

Integration with Pipeline Steps

This email functionality is integrated into the pipeline steps via Python scripts (steps/alert_report.py). If a particular test threshold fails, the execution pauses and an email is dispatched. Successful test completions proceed to the next step in the pipeline.

This notification system helps ensure the integrity and reliability of the data processing and model performance at each stage of the pipeline.

Data Quality e-mail report Data Drift e-mail report Model Performance e-mail report

We also send failed alert reports via Discord and Slack platforms.

Discord: #failed-alerts

Discord Alert:

Slack: #sales-conversion-test-failures

Slack Alert:

Prediction App ๐Ÿš€

The Prediction App is the user-facing interface that leverages the trained models to make predictions based on user input. ๐ŸŽฏ To run the streamlit application, bash streamlit run app.py

Functionality:

  • ๐ŸŒ Streamlit Application: User-friendly interface for predictions and monitoring.
  • ๐Ÿš€ Prediction App: Input parameters for prediction with a link to Neptune.ai for detailed metrics.
  • ๐Ÿ“Š Interpretability Section: Explore detailed interpretability plots, including SHAP global and local plots.
  • ๐Ÿ“ˆ Data and Model Reports: View reports on data quality, data drift, target drift, and model performance.
  • ๐Ÿ› ๏ธ Test Your Batch Data Section: Evaluate batch data quality with 67 validation tests, receive alerts on failures.

This app streamlines the process of making predictions, interpreting model outputs, monitoring data, and validating batch data.

Prediction App ๐Ÿš€

User Input Data

  • Fields: Impressions, Clicks, Spent, Total_Conversion, CPC.
  • Predict button generates approved conversion predictions.
  • ๐Ÿ”— Neptune.ai Metrics

Streamlit Prediction App

Interpretability Section

  • ๐Ÿ“ Detailed Interpretability Report: View global interpretability metrics.
  • ๐ŸŒ SHAP Global Plot: Explore SHAP values at a global level.
  • ๐ŸŒ SHAP Local Plot: Visualize SHAP values for user-input data.

SHAP Report:

LIME Report:

Data and Model Reports

  • ๐Ÿ“‰ Data Quality Report: Assess data quality between reference and current data.
  • ๐Ÿ“Š Data Drift Report: Identify drift in data distribution.
  • ๐Ÿ“ˆ Target Drift Report: Monitor changes in target variable distribution.
  • ๐Ÿ“‰ Model Performance Report: Evaluate the model's performance.

Choose Reports

  • Check options to generate specific reports.
  • Click 'Submit' to view generated reports.

Data Quality Report:

Test Your Batch Data

  1. ๐Ÿ“‚ Dataset Upload: Upload your batch dataset for validation.
  2. ๐Ÿ“ง Email Alerts: Provide an email for failure alerts.
  3. ๐Ÿ”„ Data Validation Progress: 67 tests to ensure data quality.
  4. ๐Ÿ“Š Visualizations: Scatter plot and residuals plot for validation results.
Step 1: Upload Your Batch Dataset

Upload Batch Data

Step 2: Provide Email Address for Alerts

E-mail address

Step 3: Data Validation Progress

Successful tests validation:

Successful tests validation:

Failed tests validation: Failed tests validation:

For more details, check the respective sections in the Streamlit app.

This application provides an intuitive interface for users to make predictions and monitoring effortlessly. ๐Ÿ“Šโœจ Explore the power of data-driven insights with ease and confidence! ๐Ÿš€๐Ÿ”

Neptune.ai Dashboard ๐ŸŒŠ

Leveraging the Power of Neptune.ai for Enhanced Insights and Management ๐Ÿš€

Neptune.ai offers an intuitive dashboard for comprehensive tracking and management of experiments, model metrics, and pipeline performance. Let's dive into its features:

  1. Visual Metrics: Visualize model performance metrics with interactive charts and graphs for seamless analysis. ๐Ÿ“ˆ๐Ÿ“Š
  2. Experiment Management: Track experiments, parameters, and results in a structured and organized manner. ๐Ÿงช๐Ÿ“‹
  3. Integration Capabilities: Easily integrate Neptune.ai with pipeline steps for automated tracking and reporting. ๐Ÿค๐Ÿ”—
  4. Collaboration Tools: Facilitate teamwork with collaborative features and easy sharing of experiment results. ๐Ÿค๐Ÿ’ฌ
  5. Code and Environment Tracking: Monitor code versions and track environments used during experimentation for reproducibility. ๐Ÿ› ๏ธ๐Ÿ“ฆ

Necessary Commands:

  1. Necessary imports:

    import neptune
    from neptune.types import File
    from zenml.integrations.neptune.experiment_trackers.run_state import get_neptune_run
  2. Initiate the neptune run

    neptune_run = get_neptune_run()
  3. To track the pandas dataframe:

    neptune_run["data/Training_data"].upload(File.as_html(df))
  4. Track HTML reports:

    neptune_run["html/Data Quality Test"].upload("Evidently_Reports/data_quality_suite.html")
  5. Track plot and graph visualisations:

    neptune_run["visuals/scatter_plot"].upload(File.as_html(fig1))
  6. Track model metrics:

    model["r2"].log(perf.r2())
    model["mse"].log(perf.mse())
    model["rmse"].log(perf.rmse())
    model["rmsle"].log(perf.rmsle())
    model["mae"].log(perf.mae())

Neptune.ai Dashboard runs: Neptune.ai Dashboard runs

Neptune.ai Dashboard Code files: Neptune.ai Dashboard Code files

Neptune.ai Dashboard Datasets: Neptune.ai Dashboard Datasets

Neptune.ai Dashboard visualisations: Neptune.ai Dashboard visualisations

Neptune.ai Dashboard HTML reports: Neptune.ai Dashboard HTML reports

Neptune.ai Dashboard models: Neptune.ai Dashboard models

Neptune.ai Dashboard model metrics: Neptune.ai Dashboard model metrics

Access my Neptune.ai Dashboard here

Neptune.ai enhances the project by providing a centralized platform for managing experiments and gaining deep insights into model performance, contributing to informed decision-making. ๐Ÿ“Šโœจ

Docker Configuration ๐Ÿณ

Docker is an essential tool for packaging and distributing applications. Here's how to set up and use Docker for this project:

Running the Docker Container: Follow these steps to build the Docker image and run a container:

```bash
docker build -t my-streamlit-app .
docker run -p 8501:8501 my-streamlit-app
```

Best Practices: Consider best practices such as data volume management, security, and image optimization.

GitHub Actions ๐Ÿ› ๏ธ

  • Configured CI/CD workflow for automated execution

Continuous Machine Learning (CML) Reports ๐Ÿ“Š

CML Reports Integration ๐Ÿš€

๐ŸŽฏ Predictions Scatter Plot: Visualizes model predictions against actual conversions. ๐Ÿ“ˆ Residuals Plot: Illustrates the differences between predicted and actual values.

GitHub Actions Workflow ๐Ÿ› ๏ธ

Integrated into CI/CD pipeline:

  • Automatic generation on every push event.
  • Visual insights available directly in the repository.

Predictions Scatter Plot Residuals Plot

๐ŸŒŸ These reports enhance transparency and provide crucial insights into model performance! ๐ŸŒŸ

Running the Project ๐Ÿš€

Follow these steps to run different components of the project:

  1. Training Pipeline:

    • To initiate the training pipeline, execute
     python run_pipeline.py
  2. Continuous Integration Pipeline:

    • To execute the CI/CD pipeline for continuous integration, run
    python run_ci_cd.py
  3. Streamlit Application:

    • Start the Streamlit app to access the prediction interface using,
    streamlit run app.py

Each command triggers a specific part of the project. Ensure dependencies are installed before executing these commands. Happy running! ๐Ÿƒโ€โ™‚๏ธโœจ