This project aims to analyze, forecast, and predict the prices of Hass avocados using various machine learning models. We performed extensive Exploratory Data Analysis (EDA) and employed multiple regression models to find the best predictors for future avocado prices.
The data for this project comes from the Hass Avocado Board and contains information about avocado prices and sales volume in different regions of the United States. The dataset includes both conventional and organic avocados with different Price Look-Up (PLU) codes.
- LinkedIn : Henrique Baptista
- GitHub : henriquebap
- Email : henriquebaptista2003@gmail.com
- Conventional Avocados: Generally cheaper with higher sales volume. Consumers likely prefer conventional avocados due to their cost-effectiveness.
- Organic Avocados: Pricier with lower sales volume. The significant price difference suggests cost concerns might drive consumer preferences.
- Seasonal Trends: Sales peaks in February and May 2016, and February 2017 coincide with the lowest prices, indicating seasonal effects on demand.
- Regional Sales Dynamics: Highest sales in the West and California regions, followed by South Central and North East.
- PLU and Bag Types: Avocados with PLU 4046 and 4225 see higher sales and small bags are the most sold.
- Average Price Over Time
- Total Volume Over Time
- Total Bags Over Time
- Count of Observations per Region
- All date trending Avocado Price
- Average Price over months
- Average Price over time
We trained the following regression models:
- Linear Regression
- Decision Tree Regressor
- Random Forest Regressor
- XGBoost Regressor
The Random Forest Regressor outperformed the other models in both predictive accuracy and explanatory power.
Model | Mean Absolute Error (MAE) | R² Score |
---|---|---|
Linear Regression | 0.240946 | 0.392978 |
Decision Tree | 0.144929 | 0.690714 |
Random Forest | 0.110448 | 0.847877 |
XGBoost | 0.116721 | 0.842305 |
1.MAE and R² Score Comparison
- Random Forest & XGBOOST: Actual vs Predicted Prices:
- Python 3.x
- Required libraries: pandas, seaborn, matplotlib, scikit-learn, xgboost
pip install pandas seaborn matplotlib scikit-learn xgboost
- Clone the Repository
[git clone https://github.com/yourusername/hass-avocado-prices.git](https://github.com/henriquebap/Avocado-Prices-EDA-Model-Traning.git) cd Notebook/Avocado_Price_Prediction_EDA.ipynb
- Load Dataset
Place the dataset in the project directory. Ensure it's named appropriately (e.g., avocado.csv
).
The script [notebook](https://github.com/henriquebap/Avocado-Prices-EDA-Model-Traning/blob/main/Notebook/Avocado_Price_Prediction_EDA.ipynb)
performs EDA, trains the models, evaluates them, and generates prediction plots. Ensure your dataset is correctly formatted.
- The Hass Avocado Board for providing the dataset.