Hass Avocado Prices Prediction

Introduction

This project aims to analyze, forecast, and predict the prices of Hass avocados using various machine learning models. We performed extensive Exploratory Data Analysis (EDA) and employed multiple regression models to find the best predictors for future avocado prices.

Dataset

The data for this project comes from the Hass Avocado Board and contains information about avocado prices and sales volume in different regions of the United States. The dataset includes both conventional and organic avocados with different Price Look-Up (PLU) codes.

Contact

LinkedIn : Henrique Baptista
GitHub : henriquebap
Email : henriquebaptista2003@gmail.com

Exploratory Data Analysis (EDA)

Key Insights

Conventional Avocados: Generally cheaper with higher sales volume. Consumers likely prefer conventional avocados due to their cost-effectiveness.
Organic Avocados: Pricier with lower sales volume. The significant price difference suggests cost concerns might drive consumer preferences.
Seasonal Trends: Sales peaks in February and May 2016, and February 2017 coincide with the lowest prices, indicating seasonal effects on demand.
Regional Sales Dynamics: Highest sales in the West and California regions, followed by South Central and North East.
PLU and Bag Types: Avocados with PLU 4046 and 4225 see higher sales and small bags are the most sold.

Plots

Average Price Over Time

Total Volume Over Time

Total Bags Over Time

Count of Observations per Region

All date trending Avocado Price

Average Price over months

Average Price over time

Models

We trained the following regression models:

Linear Regression
Decision Tree Regressor
Random Forest Regressor
XGBoost Regressor

Model Performance

The Random Forest Regressor outperformed the other models in both predictive accuracy and explanatory power.

Model	Mean Absolute Error (MAE)	R² Score
Linear Regression	0.240946	0.392978
Decision Tree	0.144929	0.690714
Random Forest	0.110448	0.847877
XGBoost	0.116721	0.842305

Results Visualization

1.MAE and R² Score Comparison

Random Forest & XGBOOST: Actual vs Predicted Prices:

Usage

Prerequisites

Python 3.x
Required libraries: pandas, seaborn, matplotlib, scikit-learn, xgboost

Installation

pip install pandas seaborn matplotlib scikit-learn xgboost

Running the Project

Clone the Repository

[git clone https://github.com/yourusername/hass-avocado-prices.git](https://github.com/henriquebap/Avocado-Prices-EDA-Model-Traning.git) cd Notebook/Avocado_Price_Prediction_EDA.ipynb

Load Dataset

Place the dataset in the project directory. Ensure it's named appropriately (e.g., avocado.csv).

EDA and Model Training

The script [notebook](https://github.com/henriquebap/Avocado-Prices-EDA-Model-Traning/blob/main/Notebook/Avocado_Price_Prediction_EDA.ipynb) performs EDA, trains the models, evaluates them, and generates prediction plots. Ensure your dataset is correctly formatted.

Acknowledgments

The Hass Avocado Board for providing the dataset.