Learning about Node.js
What are threads?
When ever you run programs on a computer, something called a process is started up.
A process is an instance of a computer program that is being executed.
Within a single process, we can have multiple things called threads.
You can think of a thread as a type of to-do list, that has some number of instructions that need to be executed by the cpu of you computer.
The thread is given to the cpu, and the cpu will attempt to run every instruction on it, one by one.
A single process can have multiple threads inside of it.
An important aspect of threads are scheduling, which refers to you operating system's ability to decide which thread to process at any given instance in time.
You need to remember that your computer has a limited amount of resources available to it and your cpu can only process so many instructions per second.
This starts to become very relevant when we start to get many active processes and threads on our computer.
The operating systems scheduler has to look at the different threads that are asking to be processed, and figure out how to do some amount of work on each of them while making sure that they don't have to wait too long to be processed.
We want to make sure that urgent threads don't have to wait too long to be executed.
There are a couple different strategies that are used to improve the rate at which these threads can be processed.
The first being: Adding more CPU Cores to our machine., if we have more than one core inside of our CPU then we can easily process multiple threads at the same time.
The second is: Closely examine the work that is being done by each thread and allow our operating system scheduler to detect big pauses in processing time due to expensive input and output operations
what is the event loop?
When we start up a node program on our computer, node automatically creates one thread and then executes all of our code inside of that one single thread.
Inside that single thread is something called the event loop, you can think of the event loop as being like a control structure that decides what our one thread should be doing at any given point in time.
This event loop is the absolute core of every program that you and I run and every program that you and I run has exactly one event loop.
Understanding how the event loop works is extremely important because a lot of performance concerns about node eventually boil down to how the event loop behaves.
fake code example to illustrate the event loop process
// node myFile.js const pendingTimers = []; const pendingOSTasks = []; const pendingOperations = []; // New timers, tasks, operations are recorded from myFile running myFile.runContents(); function shouldContinue() { // Check one: Any pending setTimeout, setInterval, setImmediate? // Check two: Any pending OS tasks? (Like server listening to port) // Check three: Any pending long running operations? (Like fs module) return pendingTimers.length || pendingOSTasks.length || pendingOperations; } // Entire body executes in one 'tick' while(shouldContinue()) { // 1) Node looks at pendingTimers and sees if any functions // are ready to be called. setTimeout, setInterval // 2) Node looks at pendingOSTasks and pendingOperations // and calls relevant callbacks // 3) Pause execution. Continue when... // - a new pendingOSTasks is done // - a new pendingOperation is done // - a timer is about to complete // 4) Look at pendingTimers. Call any setImmediate // 5) Handle any 'close' events } // exit back to terminal
is the event loop single threaded?
The node event loop is single threaded, but some of the functions that included inside of the node standard library are run outside of the event, and outside of that single thread (not single threaded).
Basically, the event loop uses a single thread but a lot of the code that you and I write does not actually execute inside that thread entirely.
Questions about threadpools?
Q: Can we use the threadpool for javascript code or can only nodeJS functions use it? A: We can write custom JS that uses the thread pool.
Q: What functions in node std library use the threadpool? A: All 'fs' module functions. Some crypto stuff. Depends on OS (windows vs unix based).
Q: How does this threadpool stuff fit into the event loop? A: Tasks running in the threadpool are the 'pendingOperations' in our code example.
Questions about OS Async features?
Q: What functions in node std library use the OS's async features? A: Almost everything around networking for all OS's. Some other stuff is OS specific.
Q: How does this OS async stuff fit into the event loop? A: Tasks using the underlying OS are reflected in our 'pendingOSTasks' array.
Interesting Threadpool Example
Example Code:
const https = require('https'); const crypto = require('crypto'); const fs = require('fs'); const start = Date.now(); function doRequest() { https .request('https://www.google.com', res => { res.on('data', () => {}); res.on('end', () => { console.log(Date.now() - start); }); }) .end(); } function doHash() { crypto.pbkdf2('a', 'b', 100000, 512, 'sha512', () => { console.log('Hash:', Date.now() - start); }); } doRequest(); fs.readFile('multitask.js', 'utf8', () => { console.log('FS:', Date.now() - start); }); doHash(); doHash(); doHash(); doHash();
Example Output:
$ node multitask.js 311 Hash: 1877 FS: 1878 Hash: 1888 Hash: 1891 Hash: 1901
Output Explination:
First we see the benchmark from the
http
module.Then we see one console log from the hashing function.
After we see the
file system
module call.Then we see the
3
remaining hashing function calls.There is no way that reading a file off of the hard drive can possibly take
2
seconds.If you comment out all of the hashing function calls and run the file. Example Code:
const https = require('https'); const crypto = require('crypto'); const fs = require('fs'); const start = Date.now(); function doRequest() { https .request('https://www.google.com', res => { res.on('data', () => {}); res.on('end', () => { console.log(Date.now() - start); }); }) .end(); } function doHash() { crypto.pbkdf2('a', 'b', 100000, 512, 'sha512', () => { console.log('Hash:', Date.now() - start); }); } doRequest(); fs.readFile('multitask.js', 'utf8', () => { console.log('FS:', Date.now() - start); });
Example Output:
$ node multitask.js FS: 19 185
- It takes
19
milliseconds to complete reading off the harddrive, this means that we are seeing some very intersting behavior in the implementation with the hashing function calls since it's taking~2
seconds to complete thefile system
read method.Why do we always see exactly one hash console log before the result of the file system?
- Once all the function calls (
fs.readFile
and the4
doHash
function calls) are properly allocated to the first4
threads in the thread pool.- When
fs
module call is loaded into thread1
, thread1
started to go through the process of thefs.readFile
method, where node goes out to get some information about the file, accesses the hard drive and returns the information, then node goes out and accesses the hard drive again to stream the file contents back to the application, where node returns the file contents to us.- During that first phase where
thread 1
goes out to the hard drive to get some information about the specified file, the thread will basically move on to the next transaction in line (pbkdf2
call number4
). Sothread 1
temporarily forgets about that file system call and starts calculating the hash.thread 2
will complete and be ready to accept more work, it will see that there's still a pending file system call that needs to be worked on.thread 2
will look to see if it has gotten any information back from the hard drive. That information/statistics come back intothread 2
and then continues to wrk on the file system call. Makes another follow up request to the hard drive to get the actual file contents.thread 2
processes them and we then see that console log appear.- This is why we alway see one hash get completed before the file system module call.
Why do we always see the HTTP request complete first?
- Note, both the
http
request and thefile system
call are both asynchronous, it some amount of time for both of them to complete.- Node makes use of a thread pool for some very specific function calls. In particular, almost everything inside of the
fs
module makes use of this thread pool.- The crypto module function
pbkdf2
also makes use of the thread pool as well.- However, the
https
module does not use the thread pool, instead it reaches out directly to theoperating system
and leverages the operating system to do all that networking work for us.- If we look at the times it took to complete the different operations, we see that the
https
call resolved right away, but we had to wait much longer for all the other function calls for some reason.
Improving Node Performance
- Use Node in
Cluster Mode
.- Use
Worker Threads
.Note:
- It is Recommended to Use Node in 'Cluster' Mode, for improving performance of your application.
- It is considered Experimental to Use Worker Threads.
Clustering
Cluster Manager is responsible for monitoring the health of individual instances of our application that we're going to launch at the same time on our computer. This is regarding multiple instances on
one
computer. The cluster manager itself doesn't actually execute any application code. The cluster manager isn't really responsible for handling incoming requests or fetching data from the database or doing anything like that. Instead, the cluster manager is responsible for monitoring the health of each of the individual instances.The Cluster Manager can:
- Start instances.
- Stop an instance.
- Restart an instance.
- Send an instance data.
- Do other kind of administrative tasks.
It is up to the individual instances of the server to actually process incoming requests and do things such as access the database, handle authentication, or serve up static files.
Worker Instances
Worker Instances are actually responsible for processing incoming requests.
To create worker instances the cluster manager is going to require in the cluster module from the node standard library. There is one particular function on the cluster module called
fork()
and whenever we call thatfork
function from within the cluster manager node internally goes back to our index.js file and it executes it a second time, but it executes it that second time in a slightly different mode. Basically the index.js file is being executed multiple times by node. The very first time it's going to produce the cluster manager and then every time after that it's going to be producing our worker instances.Use-case where Clustering in Node can be helpful?
If you have some routes inside of your app that usually take a while to process, but you have other routes that are very quick then by using clustering you can start up multiple instances of your server that more evenly address all the incoming requests that are coming into your application and have more predictable response times.
Where Clustering does NOT work out so well
Using
ab
(Apache Benchmark) to measure performance.The following command will test a single request to
localhost:3000/
.$ ab -c 1 -n 1 localhost:3000/
ab output:
This is ApacheBench, Version 2.3 <$Revision: 1807734 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking localhost (be patient).....done Server Software: Server Hostname: localhost Server Port: 3000 Document Path: / Document Length: 8 bytes Concurrency Level: 1 Time taken for tests: 0.975 seconds Complete requests: 1 Failed requests: 0 Total transferred: 206 bytes HTML transferred: 8 bytes Requests per second: 1.03 [#/sec] (mean) Time per request: 974.912 [ms] (mean) Time per request: 974.912 [ms] (mean, across all concurrent requests) Transfer rate: 0.21 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.0 0 0 Processing: 975 975 0.0 975 975 Waiting: 974 974 0.0 974 974 Total: 975 975 0.0 975 975
The following command will make
2
requests at the exact same time to our1
child inside of our cluster, where that1
child only has1
thread available :$ ab -c 2 -n 2 localhost:3000/
ab output:
This is ApacheBench, Version 2.3 <$Revision: 1807734 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking localhost (be patient).....done Server Software: Server Hostname: localhost Server Port: 3000 Document Path: / Document Length: 8 bytes Concurrency Level: 2 Time taken for tests: 1.918 seconds Complete requests: 2 Failed requests: 0 Total transferred: 412 bytes HTML transferred: 16 bytes Requests per second: 1.04 [#/sec] (mean) Time per request: 1917.762 [ms] (mean) Time per request: 958.881 [ms] (mean, across all concurrent requests) Transfer rate: 0.21 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.0 0 0 Processing: 974 1446 667.4 1918 1918 Waiting: 974 1445 667.5 1917 1917 Total: 974 1446 667.4 1918 1918 Percentage of the requests served within a certain time (ms) 50% 1918 66% 1918 75% 1918 80% 1918 90% 1918 95% 1918 98% 1918 99% 1918 100% 1918 (longest request)
It takes
~2
seconds. Both requests are accepted at the same time, but it can only process1
at a time.Now, we can try the same
ab
command, but instead try it will the2
children running:$ ab -c 2 -n 2 localhost:3000/
ap output:
This is ApacheBench, Version 2.3 <$Revision: 1807734 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking localhost (be patient).....done Server Software: Server Hostname: localhost Server Port: 3000 Document Path: / Document Length: 8 bytes Concurrency Level: 2 Time taken for tests: 0.928 seconds Complete requests: 2 Failed requests: 0 Total transferred: 412 bytes HTML transferred: 16 bytes Requests per second: 2.16 [#/sec] (mean) Time per request: 928.056 [ms] (mean) Time per request: 464.028 [ms] (mean, across all concurrent requests) Transfer rate: 0.43 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.0 0 0 Processing: 926 927 1.1 928 928 Waiting: 925 926 1.3 927 927 Total: 926 927 1.1 928 928 Percentage of the requests served within a certain time (ms) 50% 928 66% 928 75% 928 80% 928 90% 928 95% 928 98% 928 99% 928 100% 928 (longest request)
From the output we can see that both requests are nearly processed in parallel, where
1
request was processed slightly faster than the other. Each being processed by one of the2
children in the cluster. In this use-case we definitely a performance benefit.We can now try increasing the number children in the cluster to
6
and make anab
request of6
concurrent requests tolocalhost:3000/
:$ ab -c 6 -n 6 localhost:3000/
ab output:
This is ApacheBench, Version 2.3 <$Revision: 1807734 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking localhost (be patient).....done Server Software: Server Hostname: localhost Server Port: 3000 Document Path: / Document Length: 8 bytes Concurrency Level: 6 Time taken for tests: 2.819 seconds Complete requests: 6 Failed requests: 0 Total transferred: 1236 bytes HTML transferred: 48 bytes Requests per second: 2.13 [#/sec] (mean) Time per request: 2819.378 [ms] (mean) Time per request: 469.896 [ms] (mean, across all concurrent requests) Transfer rate: 0.43 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.0 0 0 Processing: 2784 2804 14.9 2811 2819 Waiting: 2782 2802 14.6 2810 2817 Total: 2784 2804 14.9 2811 2819 Percentage of the requests served within a certain time (ms) 50% 2811 66% 2811 75% 2819 80% 2819 90% 2819 95% 2819 98% 2819 99% 2819 100% 2819 (longest request)
When we increase the number of children in the cluster to
6
, we see that each request took~3
seconds to complete.Depending on laptop/desktop being used, there is an absolute upper limit to the computer's ability to process incoming requests and do some amount of work. So when we run our code and we do
6
at the same time, that means that in those6
separate threads that are running6
separate children we are bouncing between every hash function called at the exact same time and the CPU is trying to do a little bit of work on all the different requests all at the same time.The result is that the code was not executed six times faster, instead the result is that it took significantly longer to eventually get a response. The overall performance suffered because the CPU was trying to bounce around and process all the incoming requests at exactly the same time.
Let now try bring down the number of children to
2
and execute theab
benchmark with6
concurrent tests:Example Command:
$ ab -c 6 -n 6 localhost:3000/
Example Output:
This is ApacheBench, Version 2.3 <$Revision: 1807734 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking localhost (be patient).....done Server Software: Server Hostname: localhost Server Port: 3000 Document Path: / Document Length: 8 bytes Concurrency Level: 6 Time taken for tests: 2.910 seconds Complete requests: 6 Failed requests: 0 Total transferred: 1236 bytes HTML transferred: 48 bytes Requests per second: 1.99 [#/sec] (mean) Time per request: 2900.743 [ms] (mean) Time per request: 501.624 [ms] (mean, across all concurrent requests) Transfer rate: 0.40 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.1 0 0 Processing: 999 1999 891.1 2004 2900 Waiting: 997 1999 891.8 2004 2900 Total: 1000 2000 891.0 2005 2900 Percentage of the requests served within a certain time (ms) 50% 2005 66% 2005 75% 2900 80% 2900 90% 2900 95% 2900 98% 2900 99% 2900 100% 2900 (longest request)
You will notice that the longest request took
~3
seconds, which is basically the same as when we had6
children in the cluster, but the shortest request was almost a whole second shorter.Generally you will want to match your number of children in your cluster to either the number of physical cores or logical cores that you have.