- Linghui Wu (linghuiwu)
- Mingtao Gao (mingtao)
- Shengwenxin Ni (nswxin)
- Yanwei Pan (panyw)
The project aims to meet the needs of consumers in Chicago who have special food restrictions, allergies, and religious beliefs to consider. Compared to traditional food search engines, we enable customers to look for food products not only based on the dietary labels ("dairy-free," "vegan," "kosher," etc.) but also according to ingredients contained and nutritional information, such as trans-fat per serving. Firstly, we collect nearly 17200 pieces of product information from three major grocery stores in Chicago - Whole Foods, Trader Joe's and Jewel Osco - through web-scraping, during which regular expression has been employed to grab keywords we need. After the cleaning process, data from different sources has been merged and utilized to establish SQL database including tables named "product" and "store". Then, we built the backend for the grocery search tool by connecting the SQLite database to the frontend, a user-friendly Django web interface. Through the interface, users can enter and select dietary requirements, and the product information, along with other useful information, including store locations, will be displayed.
The BackEnd
folder is composed of three modules, the web-crawler, the data cleaner, and the database constructor.
- Web Crawler
In each of the JOCrawler.py
, TradeJoesCrawler.py
and WholeFoodsCrawler.py
, we created a class and defined helper functions to collect and store product and store information from the three grocery stores mentioned above.
To facilitate the replication of web scraping, we provide test.py
to run in the command line. You may specify the number of observations and the name of the store and it would generate sample csv files. Please see Test_Intructions.md
in the BackEnd\Crawler
folder for more details.
- Data Cleaner
Raw data gathered by crawler are stored in the relative storename_prod.csv
and storename_store.csv
files.
Using pandas, DataCleaning.py
reads the csv files and processes the data by droping missing values and unifying the column names. Then, it merges the cleaned data from three sources into two dataframes representing all the products and store information, and saves them as product.csv
and store.csv
. Those files will be later used to construct tables in the database.
- Database Constructor
database.sql
includes SQL codes to construct the product
and store
tables, to import data from csv files into the according tables, and to set the NULL values in the table.
By entering the command .read database.sql
in the shell for SQLite in terminal, the user can construct the database for the project foodsearch.sqlite3
.
The front end is mainly composed of two parts, the grocery search tool search_items.py
and the Django web interface, which we referenced the codes from the second assignment.
search_items.py
takes in args_to_ui
, a dictionary generated by the information entered by users on the interface, and then it produces query statements to retrive results from the foodsearch.sqlite3
database. To display information neatly, we only selected the product name, the store name, the store address, and additional information related to the dietary restrition that are selected or entered by the user. For example, if the user chooses less than 10g sugars contained in one serving, the interface will display product and store information as well as the amount of sugars contained per serving.
To run the Django web interface for the grocery search tool, you can enter python3 manage.py runserver
in the command line inside the FrontEnd
folder. Once the interface is started, you can access the search engine by pointing a browser to http://127.0.0.1:8000/
.
Our project aims to develop a grocery search engine catering to individuals' dietary restrictions. We have achieved the primary goal of this project. We successfully built a database containing product and store information and a user-interactive interface that implement keywords queries based on food ingredients, nutrition contents, dietary labels, and store information.
However, here are some aspects we can further improve:
- Provide more detailed dietary labels
Due to the broad definition of dietary labels and different documentations on each website, we only took "organic", "vegan", "dairy-free", and “kosher" into considerations. However, we may identify more dietary labels through content analysis on the ingredients.
- Offer more precise store locations
For now, our project can provide nearby stores by querying users’ zipcode and find stores that share first four-digit zipcode. We hope to convert store addresses to longitude and latitude and utilize Google Map API to calculate precise distances between the target store and the user.
- Design a more attractive and user-friendly interface
Due to limited practice on web development, we implemented the UI primarily based on previous assignment. With more in-depth learning, we hope to display thorough details of each product neatly.