/mh6301_info_retrieval

Python+Next+ElasticSearch app for Yelp dataset - part of NTU's MH6818 Information Retrieval course

Primary LanguageTypeScript

MH6301 - Information Retrieval

Project

This repository contains the code for the MH6301 - Information Retrieval project at the Nanyang Technological University (NTU). The project creates a simple app that enables searching on the businesses included in the Yelp Dataset.

Description

The project is divided into three main parts:

  • ElasticSearch: The ElasticSearch instance that indexes the Yelp dataset (only hosted in the Docker Compose file)
  • Indexer (hosted in the ./backend folder): The Python script that indexes the Yelp dataset into the ElasticSearch instance.
  • App (hosted in the ./frontend folder): The React app that enables searching on the indexed data.

In addition, the ./data folder contains only the business.json file from the Yelp dataset. The full dataset can be downloaded from the Yelp Dataset website.

Preview

Please view the preview video for a quick overview of the project.

Usage

Assuming you have cloned this repo to your local machine, there are two main ways to run the project:

With Docker (recommended)

  1. Ensure you have Docker Compose installed on your machine.
  2. Navigate to the root directory of the project in your terminal.
  3. Build and run the Docker containers by running docker-compose up -d.
  4. Open your web browser and navigate to http://localhost:3000 to access the application.

Without Docker

  1. Ensure you have Elasticsearch, Python, Node.js and npm installed on your machine.
  2. Start Elasticsearch locally.
  3. Navigate to the root directory of the project in your terminal.
  4. Install Python dependencies by running pip install -r requirements.txt (This assumes requirements.txt is in the root directory).
  5. Prepare and start the Python indexer by running python index.py.
  6. Install Node.js dependencies by running npm install.
  7. Once dependencies are installed, you can start the application with npm run start.
  8. Open your web browser and navigate to http://localhost:3000 to access the application.

Progress Log

2023-04-22

Dark modes

2023-04-20

TODOs:

Backend & Search

  • Enable array indexing for array fields (e.g. business.categories, checkin.date, etc.)
    • Reference: EiA, 3.3.1 Arrays
  • Enable nested type indexing for nested fields (e.g. business.hours, business.attributes, etc.)
    • Reference: EiA, 8.3 Nested type
  • Enable geolocation indexing for long/lat fields (e.g. business.latitude, business.longitude, etc.)
    • Reference: EiA, Appendix A Working with geospatial data

Front-end

  • Enable dark mode

DevOps

  • Dockerize everything

Note: EiA = Elasticsearch in Action (1st Edition)