Note: I did not build the front-end. I only take the front-end and scale the backend for practice.
The purpose of this project is to learn how to design a backend architectural system to stress test the web load. This involves learning to select multiple database management systems and determine which database system is a better fit for this type of e-commerce website. Then I have to build HTTP request methods and connect them to the databases. After benchmarking and finalize database system choice, I will host the website to multiple virtual machines and using Nginx as HTTP load balancer. I will use tools like New Relic, K6, and AWS cloudwatch to identify bottlenecks and G-zip and caching technique to optimize performance.
I used this video to get an idea on which database to select: https://www.youtube.com/watch?v=v5e_PasMdXc&feature=emb_logo
- In this project, we consider how to scale our backend as we expect there are many requests per second. We expect there is a lot of web server that will be serving a lot of people at the same time. Therefore, we consider of using distributed no-sql database than consider monolithic relational database. We want to test out 2 backend system and choose 1 from the 2 based on justification. We have to take into consideration of CAP (Consistency, Availability, Partition-Tolerance) theorem. The partition-tolerance is about the ability to scale the data. Availability is about having your data always be there. Consistency is about if you write something and a user gets the subsequence reads for a few seconds. There is especially important if your application is about stock or financial transactions. If your application is okay for your system to go down for a few seconds or minutes, availability is not your main consideration. Below is our analysis:
-
For this application
- Availability: It's important because when a user goes to your webpage to look for review, you need to show it.
- Consistency: It's okay for a few second delay before a new review shows up while the user gets the old review.
- Partition-tolerance: The web page has to be run fast.
-
MongoDB
- Strength:
- There is professional paid support for setting up the security of the database.
- Ability to outsource the administration of the system over time.
- Simplicity of use - there are many article supports for this application and large community user base.
- Partition-tolerance - to scale fast
- Strength:
-
PostgreSQL
- Strength:
- Database dealing with structured data.
- Online research indicates that PostgreSQL is faster than MySQL.
- Strength:
-
I created a folder named
database
to store all the schema and seeding files there -
Generate fake images for testing purpose.
-
run
npm install faker
to help generate fake images -
create a folder named
image
insidedatabase
folder -
Build a script named
downloadFakeImage.js
to generate 1000 fake images toimage
folder
-
- I used this following reference for tutorial: https://medium.com/@brandon.lau86/one-to-many-relationships-with-mongodb-and-mongoose-in-node-express-d5c9d23d93c2
-
I plan to seed over 10 millions of data into the schema that I just created.
- Run
npm install mongodb mongoose
so that package.json file has the dependencies for connecting MongoDB (make sure you install Mongo in your computer first).
- Run
-
Cd to the mongoDB folder and run
node condensedMongoSeed.js
to seed the database.
-
Build the seeding script: I plan to seed over 10 millions of data into CSV files and then load the csv to PostgreSQL.
- I used this following reference for tutorial of building the seeding script: https://medium.com/@danielburnsart/writing-a-large-amount-of-data-to-a-csv-file-using-nodes-drain-event-99dcaded99b5
-
Seed csv file to PostgreSQL database
- run
node newPostgreSQLReviewSeed.js
to create the seeding file.
- run
-
After installing PostgreSQL, run the command line
psql -U postgres
and subsequently enter the password in Git Bash to turn on postgresSQL in terminal to check the status -
Ensure the previous step to generate csv files are completed.
-
Upload schema.sql
- cd to the directory where the schema.sql file is stored and run the following command
psql -f schema.sql -p 5432 -U postgres
will upload the schema to postgreSQL database.
- cd to the directory where the schema.sql file is stored and run the following command
-
Check database
Must login to postgres through "psql -U postgres" first
-
To show databases:
\l
-
To drop database:
REVOKE CONNECT ON DATABASE adidas FROM public;
SELECT pid, pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE datname = current_database() AND pid <> pg_backend_pid();
3. `drop database if exists adidas;`
Web Tutorial for PostgreSQL Setup
-
Run
npm i express pg
to install dependencies. -
Build the "queries.js" and "Index.js" files to connect the API.
-
To ensure the 'POST' request is working, use postman and test it with the following json - raw - body and API
http://localhost:3000/review
{
"review_id":30000000,
"product_id":21011,
"userID":2,
"opinion":"It's bad",
"text":"It's cool",
"rating_overall":"2",
"doesRecommended":true,
"rating_size":"1/2 a size too big",
"rating_width":"Too wide",
"rating_comfort":"Comfortable",
"rating_quality":"What I expected",
"isHelpful":2,
"isNotHelpful":2,
"created_at":"May 28, 2020"
}
- To ensure the 'PUT' request, use postman and test it with the following json - raw - body and API
http://localhost:3000/review/30000004
{
"rating_overall": "5",
"text": "It's almost finish."
}
-
Create the "queries.js" file inside "database/mongoDB" and connect with the existing "Index.js" file.
-
Please note that the webpage of the front end might encounter issue because the current front-end design is following PostgresSQL data structure. The database of MongoDB structured differently than PostgreSQL.
-
To ensure the 'POST' request is working, use postman and test it with the following json - raw - body and API
http://localhost:3000/review
{
"product_id": 10000002,
"product_name": "tissue",
"review": [{
"review_id": 30000001,
"user": {
"user_id": 30000001,
"firstname": "Peter",
"lastname": "Chen",
"gender": "Male",
"nickname": "Superman",
"email": "hongkongbboy@gmail.com",
"password": "123"
},
"opinion": "It's good",
"text": "It's bad",
"rating_overall": 3,
"doesRecommended": true,
"rating_size": "a size too big",
"rating_width": "Slightly wide",
"rating_comfort": "Uncomfortable",
"rating_quality": "What I expected",
"isHelpful": 23,
"isNotHelpful": 17,
"created_at": "2007-10-19T09:03:29.967Z",
"review_photo_path": [{
"review_photo_id": 60000001,
"review_photo_url": "https://sdcuserphotos.s3.us-west-1.amazonaws.com/741.jpg"
}, {
"review_photo_id": 60000002,
"review_photo_url": "https://sdcuserphotos.s3.us-west-1.amazonaws.com/741.jpg"
}]
}, {
"review_id": 30000002,
"user": {
"user_id": 30000002,
"firstname": "Peter",
"lastname": "Chen",
"gender": "Male",
"nickname": "Superman",
"email": "hongkongbboy@gmail.com",
"password": "123"
},
"opinion": "It's good",
"text": "It's bad",
"rating_overall": 3,
"doesRecommended": true,
"rating_size": "a size too big",
"rating_width": "Slightly wide",
"rating_comfort": "Uncomfortable",
"rating_quality": "What I expected",
"isHelpful": 23,
"isNotHelpful": 17,
"created_at": "2007-10-19T09:03:29.967Z",
"review_photo_path": [{
"review_photo_id": 60000003,
"review_photo_url": "https://sdcuserphotos.s3.us-west-1.amazonaws.com/741.jpg"
}]
}]
}
- For PUT request reference, see here
MongoDB
- To estimate the query time of CRUD methods for MongoDB, we login mongoDB through terminal with command
mongo
and runshow dbs;
to look at available databases. Then runuse reviews
to get inside the database. Then runshow tables
to show database tables. To obtain the execution time of mongoDB query, put.explain("executionStats")
to the end of each query command.
- You can create a new product using this following script in mongoDB terminal
db.products.insert({"product_id": 10000002,"product_name": "tissue","review": [{"review_id": 30000001,"user": {"user_id": 30000001,"firstname": "Peter","lastname": "Chen","gender": "Male","nickname": "Superman","email": "hongkongbboy@gmail.com","password": "123"},"opinion": "It's good","text": "It's bad","rating_overall": 3,"doesRecommended": true,"rating_size": "a size too big","rating_width": "Slightly wide","rating_comfort": "Uncomfortable","rating_quality": "What I expected","isHelpful": 23,"isNotHelpful": 17,"created_at": "2007-10-19T09:03:29.967Z","review_photo_path": [{"review_photo_id": 60000001,"review_photo_url": "https://sdcuserphotos.s3.us-west-1.amazonaws.com/741.jpg"}, {"review_photo_id": 60000002,"review_photo_url": "https://sdcuserphotos.s3.us-west-1.amazonaws.com/741.jpg"}]}, {"review_id": 30000002,"user": {"user_id": 30000002,"firstname": "Peter","lastname": "Chen","gender": "Male","nickname": "Superman","email": "hongkongbboy@gmail.com","password": "123"},"opinion": "It's good","text": "It's bad","rating_overall": 3,"doesRecommended": true,"rating_size": "a size too big","rating_width": "Slightly wide","rating_comfort": "Uncomfortable","rating_quality": "What I expected","isHelpful": 23,"isNotHelpful": 17,"created_at": "2007-10-19T09:03:29.967Z","review_photo_path": [{"review_photo_id": 60000003,"review_photo_url": "https://sdcuserphotos.s3.us-west-1.amazonaws.com/741.jpg"}]}]});
- In my example, I show that my query
db.products.find({product_name:"quas"}).explain("executionStats");
has "totalDocsExamined" of 10M records with execution time of 28334ms and it returns 40222 documents that match that criteria.
-
To optimize my query for the read method, use the following tutorial video for help. I run the command
db.products.ensureIndex({product_name: 1});
to create index for the "product_name" field. Then I rundb.products.getIndexes()
to make sure the index is created. -
After optimization, I run the following query
db.products.find({product_name:"nobis"}).explain("executionStats");
and the execution time is 3301ms.
- I run the following command for my UPDATE request
db.products.update({"review.review_id": 30000002}, {$set: {"review.$.text": "it’s an updated."}});
.
- To optimize my query for the update method, I run the following command
db.products.ensureIndex({"review.review_id": 1});
. My execution time goes from 40655ms to 0ms.
- To perform the DELETE request, I can run the command
db.products.remove({"review.review_id":30000002});
in the MongoDB terminal. This function will remove the products collection. If want to remove the individual review, see here instead, such as thisdb.products.update({},{$pull:{review:{review_id: 30000002}}},{multi:true});
- After optimization through indexing, the execution time goes from 40570ms to 0ms.
PostgreSQL
-
To improve query time, we can perform indexing. See here for tutorial.
-
In addition, run the command
\timing
and it will shows the execution time of a command line.
- You can create a new review using the following command line in PostgreSQL terminal
insert into review (review_id, product_id, userID, opinion, text, rating_overall, doesRecommended, rating_size, rating_width, rating_comfort, rating_quality, isHelpful, isNotHelpful, created_at) VALUES (31000000, 2, 2, 'hello', 'this is cool', '2', true, 'a size too small', 'Too narrow', 'Uncomfortable', 'Poor', 2, 2, 'Mon Oct 06 2014 22:38:57 GMT-0700 (Pacific Daylight Time)');
- You can read detail using the following command line in PostgreSQL terminal
SELECT review.review_id, product_id, opinion, text, rating_overall, doesRecommended, rating_size, rating_width, rating_comfort, rating_quality, isHelpful, isNotHelpful, created_at, firstname, lastname, gender, nickname, email FROM review INNER JOIN users ON (users.userID = review.userID) WHERE product_id =9999500;
- To optimize the READ request in PostgreSQL, I did indexing. To perform indexing, I run the following command
Explain Analyze DELETE FROM review WHERE review_id = 31000000;
and then following with this commandAnalyze
.
The execution time reduces from 3612.243ms to 2.371ms.
- You can update values of a key using the following command line in PostgreSQL terminal
UPDATE review SET rating_overall = ‘3’, text = ‘super cute’ WHERE review_id = 31000000;
-
To optimize the UPDATE request in PostgreSQL, I perform indexing by run the following command
CREATE INDEX ON review (review_id);
and then followed byAnalyze;
-
My execution time reduced from 0.850ms to 0.064ms.
- You can delete values using the following command line
Explain Analyze DELETE FROM review WHERE review_id = 31000000;
- Using the index technical that I have done previously, my execuion time reduces from 1856.151ms to 1772.118ms.
-
Below starting this session, I need to have a goal. My current goal is expect my website to serve 10K user simultaneously.
-
I will be performance testing (load test and stress test) GET and POST requests with 1, 10, 100, 1K requests per second using K6 and will use New Relic to obtain performance data.
-
Sign up
New Relic
account on its website. Then selectNew Relic APM
.- Then there is an installation video on the right hand side of the website. Follow the instructions to complete the installation.
-
Increase Node's memory by running the following script in terminal
set NODE_OPTIONS=--max_old_space_size=8000
. See reference -
Install caching with Redis.
-
Create a
.env
file and store these data inside:
MONGO_HOST=localhost
REDIS_PORT=6379
SERVER_PORT=80
- run `npm install npm dotenv`
- add `.env` to .gitignore so when you push to github, other people won't see it.
- put `require('dotenv').config();` in the `Index.js` file so the data can get pick up.
- update `condensedMongoSchema.js` for env variables.
- update `mongoDB/queries.js` file with `const redis = require('./utils/redis.js');`.
-
Install K6 (load testing tool)
-
Launch 1st EC2 instance to deploy Database
-
Launch EC2
- Use
Amazon Linux 2
and selectt2.medium
. - Hit
Next
until you hit security configuration - Add rule for "SSH", "HTTP", "HTTPS", "Custom TCP" types. For "HTTP", "HTTPS", and "Custom TCP", select "Anywhere" for source. For "Custom TCP" only., put your port number for Port Range.
- Create a pem file and put it inside the root folder and make sure you include it in .gitignore
- run "chmod 700 xxx.pem" in terminal provided by pressing "Connect" on AWS website
- In AWS website, click
Elastic IPs
from the selection of the left drop down menu. Then pressAllocate Elastic IP address
. The reason we do this is to keep the IP address the same even if we shut down the instance because the IP address would change when we shut down instance. A new Public IPv4 address would create. Click on it. ClickAssociate Elastic IP address
. With theInstance
section, click on your current instance. ClickAssociate
- run
ssh ec2-user@13.56.236.35 -i adidas_micro.pem
- the number 54.153.0.155 you can get it from "IPv4 Public IP" in AWS EC2.
- Use
-
Install Node.js in EC2 Instance
- Follow this instruction
-
Install MongoDB on EC2 Linux
- Follow this instruction for installation.
- When ask to create a /etc/yum.repos.d/mongodb-org-4.2.repo file, run the following command line
sudo vi /etc/yum.repos.d/mongodb-org-4.2.repo
. You can save the file by pressesc
and then run:wq
to save
-
Install Git
- run
sudo yum install git
- run
-
Start Seeding
- Git clone repo from Github
- run
npm install
for dependencies - Go to the seed file location and run
node condensedMongoSeed.js
-
Change MongoDB config
- Run
sudo vi /etc/mongod.conf
- Press
i
to edit - comment out
bindIp: 127.0.0.1
by adding#
to the left - Add
bindIpAll: true
to the next line. Make sure you index it correctly. - Run
sudo service mongod restart
to trigger the change.
- Run
-
Testing server
- To test if database is setup correctly in EC2. Go to
.env
file in your local machine and changeMONGO_HOST=mongodb://localhost
toMONGO_HOST=mongodb://13.56.236.35:27017
. Then make a webpage runs to see if data displays correctly. Note that "27017" is the default code for Mongo database.
- To test if database is setup correctly in EC2. Go to
-
-
Launch 2nd EC2 for service
-
Un-highlight redis script in
mongoDB/queries.js
file. -
Delete the
bundle.js
file and rerunbundle.js
file to get a js file without redis. -
Go to AWS website and create another instance for the service with Linux. Can use
t2.micro
.
-
Save the "pem" file inside the root folder.
-
Press "connect" in AWS website and follow the instruction in "Connect to your instance". Go into the root folder where the pem file at and run
ssh ec2-user@13.57.191.130 -i adidas_service.pem
. "13.57.191.130" is the IPv4 Public IP. -
Check to make sure you have
MONGO_HOST=mongodb://13.56.236.35:27017
in ".env" folder and then git push everything to github. -
In the EC2 terminal, run these installations:
- Install Node.js in EC2 Instance
- Follow this instruction
- Install Git
- run
sudo yum install git
- run
- Update
package.json
script by runnpm install concurrently
.- change
nodemon
tonode
because nodemon does many things and it should only be used during development.
- change
- Git clone the previously pushed repo.
- Run
npm run ec2-dev
- In the web browser, enter
13.57.191.130:3000
. The website should shows up. If it doesn't, it probably there is an error with MongoDB. Runsudo service mongod start
to keep mongo going. - In the EC2 server, you can keep it running even if you delete the terminal by running
nohup npm run ec2-dev &
. Then test it's running by enterps -ax | grep node
. (optional: you can runkillall node
to remove nohup)
- Install Node.js in EC2 Instance
-
Install loader.io into the same folder that you put the
bundle.js
file- Sign up for account with loader.io
- Inside loader.io, enter
http://13.57.191.130:3000/
forNew target host
. - Git push the file to github and git pull to the instance with the server and do the nohup to run the application.
- Go back to loader.io to confirm target verification.
- Click
Add a new test
. Reference
- In the diagrams, you can tell there is 0 400 and 500 error response. There is 3 timeout which indicates there is likely 3 items that takes over 10 seconds (default) to generate a response. Response Counts is 9969. You use that number to divide by 60 second and you will get 166.15. This means that there is about 166.15 success responses when 800 clients visiting your website per seconds for that 1 minute, which is much below our throughput configuration of 800. The response time is 8399 ms which means that it takes about ~9 seconds to get the response from the GET request when 800 clients are visiting the website at the same time, but I need to target it to under 2000 ms. (Note that when you compare the loader.io result with NewRelic, you might get a slight different which loader.io giving a higher throughput. Goes with that.) You can go to the EC2 terminal root folder and run
less nohup.out
(then press shift + G) to see a log of which GET request items are being tested and used it to change the testing path. Reference
- As you can tell from the diagram, around 225 clients per second seem to be my limit as it averages around 1941 ms response time, which is slightly below my target of average 2000 ms response time.
-
New Relic vs Loader.io comparsion
- Next, we can switch to the New Relic website.
- From the diagram, you can tell that ~4720ms is different than the 8399ms. You can choose either the New Relic or Loader.io result in this part.
- Click on one of the transactions and you will notice the following diagram.
- From this diagram, you will notice that the majority of the response time is spend on
MongoDB products toArray
. Therefore, we need to figure out how to reduce that to reach out target goal of 2000.
-
AWS - Cloudwatch
-
Ram and CPU usage
-
G-zip and webpack production mode update
- Run
npm install compression compression-webpack-plugin brotli-gzip-webpack-plugin
- Update
webpack.config.js
with production configuration. - Update
Index.js
with compression dependency. - Update script of package.json
- Run
npm run prod
to test if it works in local machine. - Once it works, push to github and upload to EC2 server. Then run your instance there and do nohup to do a loader.io test. For me, I was able to go up to 250 clients per second with average response time of much lower than 2000 (293ms is my result) and err rate of 0%. However, if I increases more clients per second, my err rate increases.
- Run
-
-
Launch 3rd EC2 for NGINX
-
For setup, see reference here. Only install Nginx and not Nginx Plus. Note that the default security setting sets the nginx load balancer with port number of 80.
-
Run
systemctl status nginx.service
to see if it is actively running. -
Run
cd /etc/nginx/
and thensudo vi nginx.conf
. See reference to update "Nginx.conf". In my example, I updated with the following:
Please note that proper indexing does matter. "13.57.191.130" is the IPv4 of the EC2 service. You can name anything to replace "nodejs". The "nodejs" part of the "http://nodejs" has to match the "nodejs" of the upstream.
-
Run
sudo systemctl stop nginx
, thensudo systemctl start nginx
, thensystemctl status nginx.service
. You should expect to see a green sign of "active (running)". If you get a red sign of "failed", look into the error and redo this step. -
Go to
loader.io
website and replace the target host. My previous target host is13.57.191.130:3000
which refers to my EC2 service. I will replace to18.222.19.182:80
. Copy the verification token to the current token in the "client" folder. Then git push to Git and go to EC2 service and re-run the service with the nohup setting activated. You may re-test the loader.io, the result should be similar to what we previously tested.
-
-
Horizontal Scaling by create 4th and 5th EC2 Instances
- Go to EC2 and select the 1st service. Select
Action
->Image
->Create Image
. ForImage name
, you can putService 2
and2nd Instance for Service
forImage description
. Then clickCreate Image
on bottom. Then on AWS left drop down menu selectImages
thenAMIs
. Then clickLaunch
. In security setting, following the previous security setting with the service: Generate an unique pem file and put in root folder. If you receive an error withPlease login as the user "ec2-user" rather than the user "root".
, change the ssh script as follow: Then go to the root folder and runnohup npm run prod &
- Repeat the previous process to get another image.
- Go to the instance of the Nginx load balancer and modify the file
nginx.conf
to capture the servers. Then runsudo systemctl stop nginx
, thensudo systemctl start nginx
, thensystemctl status nginx.service
. You should expect to see a green sign of "active (running)". If you get a red sign of "failed", look into the error and redo this step.
- Go to EC2 and select the 1st service. Select
-
After integrating with 3 services to Nginx, I began to look at loader.io, New Relic, and AWS Cloudwatch. Here's my analysis: When I look at the loader.io results, it shows that when I set 280 requests per second, the average response time is 186ms, which is much faster than what I have tested previously without Nginx setup. The error rate and timeout are also 0. However, when I set to 300 requests per second, it starts to build err rate and there are about 269 responses with 500 error codes. This is not good. When I look at the AWS Cloudwatch result, I noticed that the CPU usage is pretty low. So I don't have a CPU issue. When I look at the New Relic result, I realize that majority of the response time comes from "MongoDB products toArray". Therefore, I need to figure out how to improve the database.