Tomato Price Prediction

tl;dr

Scraped Tomato prices in Karnataka from Jan-01-2015 to Feb-01-2021 from the Agricultural Marketing website of the Government of India
Trained a Random Forest Regression model
Developed a Flask API
Developed a web app using Flask

File System

tomato_price_prediction/ #Home Directory
  |-images/
  |-static/
    |-style.css #CSS file for the web app
  |-templates/
    |-home.html #html code for home page
    |-predict.html #html page for predict page
  |-Scrapper.ipynb #Python Notebook for web scraping code
  |-api.py #Flask API
  |-app.py #Flask Web App
  |-code.ipynb #Python notebook with EDA and Model developemnt code
  |-prediction_model.py #functions used in api.py

Note: Click here to access the pre-trained ML model.

Technologies Used

Python
Pandas
Plotly
Machine Learning
Flask
HTML, CSS
Selenium
Beuatiful Soup

Data

Data used in this application was scraped from the Agricultural Marketing website of the Government of India using Selenium and Beautiful Soup.
The data consists of 35544 enties of Tomato prices in Karnataka from Jan-01-2015 to Feb-01-2021 from different districts and markets within these districts.
First five entries in the data set are:

	District Name	Market Name	Commodity	Variety	Grade	Min Price (Rs./Quintal)	Max Price (Rs./Quintal)	Modal Price (Rs./Quintal)	Price Date
0	Davangere	Davangere	Tomato	Tomato	FAQ	400	600	500	2015-01-01
1	Davangere	Honnali	Tomato	Tomato	FAQ	800	1000	900	2015-01-01
2	Kolar	Srinivasapur	Tomato	Tomato	FAQ	465	1335	935	2015-01-01
3	Bangalore	Channapatana	Tomato	Tomato	FAQ	1000	1400	1200	2015-01-01
4	Shimoga	Shimoga	Tomato	Tomato	FAQ	400	600	500	2015-01-01

Data Analysis

By looking at the rolling average with a 30 day window, we can observe that tomato prices in Karnatak follows a seasonal trend:

There are two major spikes in the prices during a year. First is the sharp rise around the months of June-July. This rise is followed by another but lower spike in the month of december.
The lowest prices are observed in the year 2018.
The highest peaks are observed in the year 2016 and 2017.

Another observable trend is that average modal price of tomatoes per quintal in Bangalore is higher than that in the rest of the state.

Model

A Random Forest Regression Model was used in as the prediction model. Presence of categorical variables suits the base estimator (Decision Trees) and Random forest being a bagging algorithm, is robust to varying varaible values.

Pipeline(steps=[('column_transformer',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('cat',
                                                  OneHotEncoder(drop='first',
                                                                sparse=False),
                                                  ['District Name',
                                                   'Market Name', 'Variety',
                                                   'Grade']),
                                                 ('scale', MinMaxScaler(),
                                                  ['year', 'month',
                                                   'day of the month',
                                                   'day of the week'])])),
                ('rfr', RandomForestRegressor(n_estimators=300))])

Model Performance

Evaluation metric used to check the model performance was Mean Absolute Error.
The Mean Absolute Error value given by the model on the test data was 175.86

chawla201/tomato_price_prediction