This repository contains instructions and queries for Soen 363 Data systems class project phase 2.
The purpose of this project is to investage NoSQL database systems on a dataset.
Our NoSQL system was MongoDB and the dataset of choice was the Yelp Open Dataset .
The installation was conducted on a Virtual Machine running Ubuntu 18.04.
From the terminal, issue the following command to import the MongoDB public GPG key:
$ wget -qO - https://www.mongodb.org/static/pgp/server-4.2.asc | sudo apt-key add -
Create the list file /etc/apt/sources.list.d/mongodb-org-4.2.list for your version of Ubuntu:
$ echo "deb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu bionic/mongodb-org/4.2 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-4.2.list
Reload the local package database:
$ sudo apt-get update
Install the MongoDB packages:
$ sudo apt-get install -y mongodb-org
To begin using the mongo shell:
$ mongo
Provide your information, agree to Dataset Licence and download the 9.71 gb datset from https://www.yelp.com/dataset
Once downloaded, unzip download folder into the same directory as the DataSystemsYelp repository.
From the mongo shell, create the YelpData database:
>use YelpData
Then create the collections for YelpData:
>db.createCollection('business')
>db.createCollection('checkin')
>db.createCollection('review')
>db.createCollection('user')
>db.createCollection('tip')
Once the database and collection is created, exit the mongo shell and open a new terminal to import the collections
$ mongoimport --db YelpData --collection business --file yelp_academic_dataset_business.json
$ mongoimport --db YelpData --collection business --file yelp_academic_dataset_checkin.json
$ mongoimport --db YelpData --collection business --file yelp_academic_dataset_review.json
$ mongoimport --db YelpData --collection business --file yelp_academic_dataset_user.json
$ mongoimport --db YelpData --collection business --file yelp_academic_dataset_tip.json
All the queries are stored in the queries.sh, but they must be individually issued on the mongoshell and you must use the YelpData database.