System Design

Note: I did not build the front-end. I only take the front-end and scale the backend for practice.

Project Purpose

The purpose of this project is to learn how to design a backend architectural system to stress test the web load. This involves learning to select multiple database management systems and determine which database system is a better fit for this type of e-commerce website. Then I have to build HTTP request methods and connect them to the databases. After benchmarking and finalize database system choice, I will host the website to multiple virtual machines and using Nginx as HTTP load balancer. I will use tools like New Relic, K6, and AWS cloudwatch to identify bottlenecks and G-zip and caching technique to optimize performance.

Step 1 - Database Selection

I used this video to get an idea on which database to select: https://www.youtube.com/watch?v=v5e_PasMdXc&feature=emb_logo

In this project, we consider how to scale our backend as we expect there are many requests per second. We expect there is a lot of web server that will be serving a lot of people at the same time. Therefore, we consider of using distributed no-sql database than consider monolithic relational database. We want to test out 2 backend system and choose 1 from the 2 based on justification. We have to take into consideration of CAP (Consistency, Availability, Partition-Tolerance) theorem. The partition-tolerance is about the ability to scale the data. Availability is about having your data always be there. Consistency is about if you write something and a user gets the subsequence reads for a few seconds. There is especially important if your application is about stock or financial transactions. If your application is okay for your system to go down for a few seconds or minutes, availability is not your main consideration. Below is our analysis:

For this application
- Availability: It's important because when a user goes to your webpage to look for review, you need to show it.
- Consistency: It's okay for a few second delay before a new review shows up while the user gets the old review.
- Partition-tolerance: The web page has to be run fast.
1. MongoDB
  - Strength:
    - There is professional paid support for setting up the security of the database.
    - Ability to outsource the administration of the system over time.
    - Simplicity of use - there are many article supports for this application and large community user base.
    - Partition-tolerance - to scale fast
2. PostgreSQL
  - Strength:
    - Database dealing with structured data.
    - Online research indicates that PostgreSQL is faster than MySQL.

Step 2 - Develope Schemas for PostgreSQL and MongoDB

I created a folder named database to store all the schema and seeding files there
Generate fake images for testing purpose.
1. run npm install faker to help generate fake images
2. create a folder named image inside database folder
3. Build a script named downloadFakeImage.js to generate 1000 fake images to image folder

MongoDB Schemas

I used this following reference for tutorial: https://medium.com/@brandon.lau86/one-to-many-relationships-with-mongodb-and-mongoose-in-node-express-d5c9d23d93c2

MongoDB Seeding

I plan to seed over 10 millions of data into the schema that I just created.
1. Run npm install mongodb mongoose so that package.json file has the dependencies for connecting MongoDB (make sure you install Mongo in your computer first).
Cd to the mongoDB folder and run node condensedMongoSeed.js to seed the database.

PostgreSQL Seeding

Build the seeding script: I plan to seed over 10 millions of data into CSV files and then load the csv to PostgreSQL.
- I used this following reference for tutorial of building the seeding script: https://medium.com/@danielburnsart/writing-a-large-amount-of-data-to-a-csv-file-using-nodes-drain-event-99dcaded99b5
Seed csv file to PostgreSQL database
- run node newPostgreSQLReviewSeed.js to create the seeding file.

PostgreSQL Schema

After installing PostgreSQL, run the command line psql -U postgres and subsequently enter the password in Git Bash to turn on postgresSQL in terminal to check the status
Ensure the previous step to generate csv files are completed.
Upload schema.sql
- cd to the directory where the schema.sql file is stored and run the following command psql -f schema.sql -p 5432 -U postgres will upload the schema to postgreSQL database.
Check database

Must login to postgres through "psql -U postgres" first

To show databases: \l
To drop database:
1. REVOKE CONNECT ON DATABASE adidas FROM public;

SELECT pid, pg_terminate_backend(pid) 
FROM pg_stat_activity 
WHERE datname = current_database() AND pid <> pg_backend_pid();

3. `drop database if exists adidas;`

Create API to support CRUD operations (PostgreSQL)

Web Tutorial for PostgreSQL Setup

Run npm i express pg to install dependencies.
Build the "queries.js" and "Index.js" files to connect the API.
To ensure the 'POST' request is working, use postman and test it with the following json - raw - body and API http://localhost:3000/review

{
	"review_id":30000000,
	"product_id":21011,
	"userID":2,
	"opinion":"It's bad",
	"text":"It's cool",
	"rating_overall":"2",
	"doesRecommended":true,
	"rating_size":"1/2 a size too big",
	"rating_width":"Too wide",
	"rating_comfort":"Comfortable",
	"rating_quality":"What I expected",
	"isHelpful":2,
	"isNotHelpful":2,
	"created_at":"May 28, 2020"
}

To ensure the 'PUT' request, use postman and test it with the following json - raw - body and API http://localhost:3000/review/30000004

{
	"rating_overall": "5",
	"text": "It's almost finish."
}

Create API to support CRUD operations (MongoDB)

Create the "queries.js" file inside "database/mongoDB" and connect with the existing "Index.js" file.
Please note that the webpage of the front end might encounter issue because the current front-end design is following PostgresSQL data structure. The database of MongoDB structured differently than PostgreSQL.
To ensure the 'POST' request is working, use postman and test it with the following json - raw - body and API http://localhost:3000/review

{
	"product_id": 10000002,
	"product_name": "tissue",
	"review": [{
		"review_id": 30000001,
		"user": {
			"user_id": 30000001,
			"firstname": "Peter",
			"lastname": "Chen",
			"gender": "Male",
			"nickname": "Superman",
			"email": "hongkongbboy@gmail.com",
			"password": "123"
		},
		"opinion": "It's good",
		"text": "It's bad",
		"rating_overall": 3,
		"doesRecommended": true,
		"rating_size": "a size too big",
		"rating_width": "Slightly wide",
		"rating_comfort": "Uncomfortable",
		"rating_quality": "What I expected",
		"isHelpful": 23,
		"isNotHelpful": 17,
		"created_at": "2007-10-19T09:03:29.967Z",
		"review_photo_path": [{
			"review_photo_id": 60000001,
			"review_photo_url": "https://sdcuserphotos.s3.us-west-1.amazonaws.com/741.jpg"
		}, {
			"review_photo_id": 60000002,
			"review_photo_url": "https://sdcuserphotos.s3.us-west-1.amazonaws.com/741.jpg"
		}]
	}, {
		"review_id": 30000002,
		"user": {
			"user_id": 30000002,
			"firstname": "Peter",
			"lastname": "Chen",
			"gender": "Male",
			"nickname": "Superman",
			"email": "hongkongbboy@gmail.com",
			"password": "123"
		},
		"opinion": "It's good",
		"text": "It's bad",
		"rating_overall": 3,
		"doesRecommended": true,
		"rating_size": "a size too big",
		"rating_width": "Slightly wide",
		"rating_comfort": "Uncomfortable",
		"rating_quality": "What I expected",
		"isHelpful": 23,
		"isNotHelpful": 17,
		"created_at": "2007-10-19T09:03:29.967Z",
		"review_photo_path": [{
			"review_photo_id": 60000003,
			"review_photo_url": "https://sdcuserphotos.s3.us-west-1.amazonaws.com/741.jpg"
		}]
	}]
}

For PUT request reference, see here

DBMS Benchmarking

MongoDB

To estimate the query time of CRUD methods for MongoDB, we login mongoDB through terminal with command mongo and run show dbs; to look at available databases. Then run use reviews to get inside the database. Then run show tables to show database tables. To obtain the execution time of mongoDB query, put .explain("executionStats") to the end of each query command.

CREATE

You can create a new product using this following script in mongoDB terminal

db.products.insert({"product_id": 10000002,"product_name": "tissue","review": [{"review_id": 30000001,"user": {"user_id": 30000001,"firstname": "Peter","lastname": "Chen","gender": "Male","nickname": "Superman","email": "hongkongbboy@gmail.com","password": "123"},"opinion": "It's good","text": "It's bad","rating_overall": 3,"doesRecommended": true,"rating_size": "a size too big","rating_width": "Slightly wide","rating_comfort": "Uncomfortable","rating_quality": "What I expected","isHelpful": 23,"isNotHelpful": 17,"created_at": "2007-10-19T09:03:29.967Z","review_photo_path": [{"review_photo_id": 60000001,"review_photo_url": "https://sdcuserphotos.s3.us-west-1.amazonaws.com/741.jpg"}, {"review_photo_id": 60000002,"review_photo_url": "https://sdcuserphotos.s3.us-west-1.amazonaws.com/741.jpg"}]}, {"review_id": 30000002,"user": {"user_id": 30000002,"firstname": "Peter","lastname": "Chen","gender": "Male","nickname": "Superman","email": "hongkongbboy@gmail.com","password": "123"},"opinion": "It's good","text": "It's bad","rating_overall": 3,"doesRecommended": true,"rating_size": "a size too big","rating_width": "Slightly wide","rating_comfort": "Uncomfortable","rating_quality": "What I expected","isHelpful": 23,"isNotHelpful": 17,"created_at": "2007-10-19T09:03:29.967Z","review_photo_path": [{"review_photo_id": 60000003,"review_photo_url": "https://sdcuserphotos.s3.us-west-1.amazonaws.com/741.jpg"}]}]});

READ (Without optimization)

In my example, I show that my query db.products.find({product_name:"quas"}).explain("executionStats"); has "totalDocsExamined" of 10M records with execution time of 28334ms and it returns 40222 documents that match that criteria.

READ (With optimization)

To optimize my query for the read method, use the following tutorial video for help. I run the command db.products.ensureIndex({product_name: 1}); to create index for the "product_name" field. Then I run db.products.getIndexes() to make sure the index is created.
After optimization, I run the following query db.products.find({product_name:"nobis"}).explain("executionStats"); and the execution time is 3301ms.

UPDATE (Without optimization)

I run the following command for my UPDATE request db.products.update({"review.review_id": 30000002}, {$set: {"review.$.text": "it’s an updated."}});.

UPDATE (With optimization)

To optimize my query for the update method, I run the following command db.products.ensureIndex({"review.review_id": 1});. My execution time goes from 40655ms to 0ms.

DELETE (Without optimization)

To perform the DELETE request, I can run the command db.products.remove({"review.review_id":30000002}); in the MongoDB terminal. This function will remove the products collection. If want to remove the individual review, see here instead, such as this db.products.update({},{$pull:{review:{review_id: 30000002}}},{multi:true});

DELETE (With optimization)

After optimization through indexing, the execution time goes from 40570ms to 0ms.

PostgreSQL

To improve query time, we can perform indexing. See here for tutorial.
In addition, run the command \timing and it will shows the execution time of a command line.