This repository contains instructions and queries for Soen 363 Data systems class project phase 2.
The purpose of this project is to investage NoSQL database systems on a dataset.
Our NoSQL system was MongoDB and the dataset of choice was the Yelp Open Dataset .
The installation was conducted on a Virtual Machine running Ubuntu 18.04.
From the terminal, issue the following command to import the MongoDB public GPG key:
$ wget -qO - | sudo apt-key add -
Create the list file /etc/apt/sources.list.d/mongodb-org-4.2.list for your version of Ubuntu:
$ echo "deb [ arch=amd64,arm64 ] bionic/mongodb-org/4.2 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-4.2.list
Reload the local package database:
$ sudo apt-get update
Install the MongoDB packages:
$ sudo apt-get install -y mongodb-org
To begin using the mongo shell:
$ mongo
Provide your information, agree to Dataset Licence and download the 9.71 gb datset from
Once downloaded, unzip download folder into the same directory as the DataSystemsYelp repository.
From the mongo shell, create the YelpData database:
>use YelpData
Then create the collections for YelpData:
Once the database and collection is created, exit the mongo shell and open a new terminal to import the collections
$ mongoimport --db YelpData --collection business --file yelp_academic_dataset_business.json
$ mongoimport --db YelpData --collection business --file yelp_academic_dataset_checkin.json
$ mongoimport --db YelpData --collection business --file yelp_academic_dataset_review.json
$ mongoimport --db YelpData --collection business --file yelp_academic_dataset_user.json
$ mongoimport --db YelpData --collection business --file yelp_academic_dataset_tip.json
All the queries are stored in the, but they must be individually issued on the mongoshell and you must use the YelpData database.