About this presentation
- Started looking at nodejs in anger about 1,5 month ago.
- Compiled a list of topics outside the usual (dev focused) tutorials
- Not all of this is battletested in production
- Beginners, there should be enough for you
- Advanced, Gurus , please chime in, I'd love your feedback!
Things you should avoid (period)
avoid typos, missing commas, etc..
- use jshint - http://www.jshint.com/docs/ in combination with grunt
avoid using global space: (use var where you can)
- use
var a = 'bla'
avoid using eval, with, switch: (without defaults)
- use
'use strict';
or :$ node --use_strict
don't run npm as root:
- npm pre-install in package.json can be evil
don't run as root:
use process.setuid, gid - http://blog.liftsecurity.io/post/37388272578/writing-secure-express-js-apps
or sudo in your start script
or use authbind or similar for non-priviledged users
http.createServer(app).listen(app.get('port'), function(){ console.log("Express server listening on port " + app.get('port')); process.setgid(config.gid); process.setuid(config.uid); });
- mocha, vows, expect, ...
mocha watch
- sinon for spies, mocks, stubs
- nock for HTTP testing
- zombie for testing
- phantomjs
- passport-stub
- for socketio in async tests (force new connection)
Versioning & Packaging
- use semver versioning in package.json
Dependencies in package.json
- dependencies , dev-dependencies
- optionalDependencies
- shrinkwrap (freeze your dependencies)
- peerDependencies: use for plugins
npm pack
- bashpack (turns a nodejs process in bash script)
Error handling
Try/catch , but ...
- Try/catch for errors, but this will not catch async errors
Return an Error object and not a string
- The fundamental benefit of Error objects is that they automatically keep track of where they were built and originated
callback(new Error('my own error'))
- http://www.devthought.com/2011/12/22/a-string-is-not-an-error/
Better stacktraces
- better-stack-traces
- LongJohn (eventemitters)
- Longer stacks in chai
- https://github.com/felixge/node-stack-trace
- https://github.com/scaryzet/node-stack-parser
Note that I recommend using function Foo() { ... }
for constructors instead of var Foo = function() { ... }
// Constructor
function Foo(bar) {
// always initialize all instance properties
this.bar = bar;
this.baz = 'baz'; // default value
// class methods
Foo.prototype.fooBar = function() {
// export the class
module.exports = Foo;
Listen for ALL errors
makes the process exit if no listener: -
listen to http, redis connections , express, socketIO, log handler, metrics handler
return a callback on error (as otherwise it will continue)
Use domains to contain errors
- But imagine , 1 url gives an eror, it will actually exit the complete server! Lots of CPU power waisted for a single error
- Therefore the concept of domains were introduced
- http://nodejs.org/api/domain.html
- with it you can create per 'domain' exceptions that will emit an on('error') to handle them
- Domains & express: https://github.com/mathrawka/express-domain-errors
- But it seems connect grabs express exception with a try catch in it's code
- Stack overflow explanation: http://stackoverflow.com/questions/16174664/nodejs-error-handling-with-domains-and-socket-io
- Domains & Connect: https://github.com/baryshev/connect-domain
- Example on how to use connect-domain The fundamental benefit of Error objects is that they automatically keep track of where they were built and originated
- it's very common in the nodejs world to just exit on the exception
Not sure if this still makes sense with domains
return on a callback
function(err,callback) {
if (err) { return callback('err'); }
console.log('this will be printed too')
but: in loops or others returns
means something else
callback(err) vs emit('error')
Not sure yet here, both? only if callback, only if error listener?
- https://groups.google.com/forum/m/#!topic/nodejs/QZa6bookqL0
- https://groups.google.com/forum/m/#!topic/nodejs/hqO0w6XgOlA
I like https://github.com/flatiron/winston/
Has the logger.log, logger.info, logger.debug etc.
Can log metadata, or json objects
Can use multiple
: https://github.com/jedi4ever/socialapp/blob/master/lib/utils/logger.js -
Create as a singleton, require and reuse in other modules (module cache will reuse same object)
var logger = function (options) {
// If we have already been initialized if (sharedLogger) { return sharedLogger; }
var logger = require('../utils/logger')().loggers.get('express');
Multiple Outputs
Console, File logging, rotation
Also has a logstash output:
Use in express
// enable web server logging; pipe those log messages through winston
// http://stackoverflow.com/questions/9141358/how-do-i-output-connect-expresss-logger-output-to-winston
var winstonStream = {
write: function(message, encoding){
logger.info(message.slice(0,-1)); //remove newline
expressApp.use(express.logger({stream: winstonStream}));
Use in socketio
var ioServer = socketIO.listen(webServer, { logger: logger , log:true});
Keep it up
Check if up - http://pingdom.com
Use of upstart/ forever https://github.com/nodejitsu/forever - just kidding
The basics
Have multiple nodejs processes listen on the same socket.
The trick? pass the socket/file descriptor from a parent process and have the server.listen reuse that descriptor. So multiprocess in their own memory space (but with ENV shared usually)
It does not balance, it leaves it to the kernel.
In the last nodejs > 0.8 there is a cluster module (functional although marked experimental)
- http://nodejs.org/api/cluster.html
- Simple cluster example: https://gist.github.com/dsibilly/2992412
- Simple cluster example + domains: http://shapeshed.com/uncaught-exceptions-in-node/
- Isaacs gist that was used as inspiration for the cluster doc - https://gist.github.com/isaacs/5264418
Note: not yet found a 100% reason to favor multi process/cluster nodejs over nginx/haproxy stuff:
- http://blog.argteam.com/coding/hardening-node-js-for-production-part-3-zero-downtime-deployments-with-nginx/
- Also see this actionhero related blogpost on elegant downtime in relation to sockets , websockets etc..
- The rest off this post is to do clustering etc... yourself, otherwise you might want to check actionhero.js
Current Cluster/Domain Enhancements:
The core included module is basic, it tells you to take care of all the stuff you need in zerodowntime environments. Most of the tools below allow you to:
- wait for workers to correctly close or sigKill if past timeout
- sigUSR2 to reload workers one by one
- some provide a cli to do the work using a socket/network connection
- kill a worker that has become unresponsive by waiting for a heartbeat
- put itself offline and not accepting any new requests
Code at: https://github.com/doxout/recluster/blob/master/index.js
Good: Simple and in use
Bad: No CLI , No domains, Cannot pass args
Framework in use at ebay:
Good: complete with monitoring, control URL
Bad: Seems to be massive ...
Naught:Note: naught2 was a temporary fork, but it got merged into master again
talks about Zero Downtime Crashed by intelligently handling express errors with domains
Good: simple, uses domains
Bad: seems to do it's own logging
cluster-master :- Build by nodejs god @isaacs
- To be investigated
Up:- Seems to be the new learnbooost way for zero-downtime
- http://thechangelog.com/up-node-powered-zero-downtime-reloads-and-load-balancing/
Others to check:
- All Cluster npm modules: https://npmjs.org/browse/keyword/cluster
- Bowl: https://github.com/waka/node-bowl
- Herd: https://github.com/segmentio/herd
- Jumpstarter: https://npmjs.org/package/jumpstarter
- Multi Cluster: https://npmjs.org/package/multi-cluster
- Pluribus: https://github.com/twistdigital/pluribus
- Simple node cluster: https://github.com/audreyt/node-cluster-server
Older solutions:
(+2 years no updated & probably not nodejs > 0.8 compliant) So you can safely ignore these, but they can give inspiration
- InfoQ blogpost on multi-core nodejs (is from 2010) - http://www.infoq.com/articles/multi-core-node-js
- The inspiration: has some cool options but alas < 0.8 compliant http://learnboost.github.io/cluster/
- https://github.com/pgte/fugue/wiki/How-Fugue-Works
- https://github.com/kriszyp/multi-node + good writeup
- https://github.com/dvv/stereo
Worth mentioning:
to be investigated
- http://naholyr.fr/2012/09/profiler-son-application-nodejs/
- http://mindon.github.io/blog/2012/04/26/profiling-nodejs-application/
Profiling tools
Callgrind - http://valgrind.org/docs/manual/cl-manual.html
- Connect-profiler - http://qzaidi.github.io/2012/07/15/node-profiling/
- https://github.com/chrisa/node-dtrace-provider#readme
- http://blog.nodejs.org/2012/04/25/profiling-node-js/
- https://github.com/bahamas10/node-dtrace-examples/tree/master/function-trace
- http://www.slideshare.net/bcantrill/instrumenting-the-realtime-web-nodejs-in-production
- http://s.urge.omniti.net/i/content/slides/Surge2012-DavidP_Nodejs.pdf
- https://npmjs.org/package/nodetrace
Node-memwatch https://hacks.mozilla.org/2012/11/tracking-down-memory-leaks-in-node-js-a-node-js-holiday-season/
'Known' Limits/Tuning
ulimit Filedescriptors
Nagle algorithm
Set listen backlog, max agents and max open files limit. - http://qzaidi.github.io/2013/05/14/node-in-production/
limit posts
Use statsd backend to send counters, timers etc..
- sivy/node-statsd is feature complete and seems to be the most popular
- dscape/lynx has streams + (some wierd random/sampling stuff)
- msiebuhr/node-statsd-client has express helpers & multi child options
- fasterize/node-statsd-profiler has transformation function
- godmodelabs/statistik has a CLI interface
reuse UDP connection: yes
prefix: yes
suffix: yes
dnscache: yes
mock: yes
samplerate: yes
errors: yes (eventemitter & bubbleup)
timing: yes
count: yes
increment: yes
decrement: yes
gauge: yes
set: yes
batch: yes
callback: yes
reuse UDP connection: yes (+ provide your own)
prefix: yes (called scope)
error: provide an error function
network : USE of ephemeral sockets!
samplerate: yes (use special random)
batching yes:
increment: yes
decrement: yes
timing: yes
gauge: yes
set: yes
- uses as stream in/out: yes (uses parser)
reuse UDP connections: yes (ephemeral socket)
prefix: yes
count: yes
gauges: yes
increment: yes
decrement: yes
sets: yes
timings: yes (delays)
- socketTimeout: yes
- children (multi prefix)
- express helper: yes (or per URL)
reuse UDP: no
callback: yes (but default prints to console)
samplerate: yes
prefix: no
suffix: yes
batch: yes
count: yes
gauges: yes
increment: yes
decrement: yes
sets: yes (called modify)
timings: (delays)
(fork from node-statsd)
samplerate: yes
timing: yes
count: yes
increment: yes
decrement: yes
gauge: yes
set: ??
timingstart: yes
timignend: yes
- introduces: key aliases
- transformKey function: YES
Note: you can safely ignore this lib
- reuse UDP connection: no
- timing: yes
- count: yes
- increment: yes
- decrement: yes
- gauge: yes
- errors: total ignore
- prefix: no
- samplerate: no
- callback: no
Note: use node-statsd instead unless cli is something special for you. The feature set is smaller than node-statsd and no special features, so we'll ignore this
- udp connection reuse: no
- timing: yes
- counter: yes
- increment: yes
- decrement: yes
- gauge: yes
- rawSend: yes
- samplerate: yes
- batch: no
Mixing in express & connect:
https://github.com/fetep/connect-logger-statsd/blob/master/lib/connect-logger-statsd.js Puts your responsetimes & status in statsd
Although the fork of @sansmischevia seems to be more advanced https://github.com/fetep/connect-logger-statsd/network Has ignore list, sends full path if needed,
https://github.com/dokipen/connect-statsd/blob/master/index.js This focuses on writeHead, it will calculate the timeelapsed before sending back to the client
Note: I've not included any specific backends here, we're focusing on generic statsd usage
https://github.com/dscape/winston-statsd (logger -> statsd)
https://github.com/dscape/statsd-parser (streaming parser)
https://github.com/benjaminwootton/StatsdDashboard (dashboard)
What is a set in statsd?
Sets are acting like simple counters, with the additional specificity that it ignores duplicate values.
Technically, all values are stored in a set, and the number of elements in the set is sent to graphite during flushes. Sets are also emptied during flushes (in the same way that counters are reset to 0).
We have been using it in production for a while now, and it is working as expected. The use case that was used during the development of that feature was the following (it has been since extended to other cases as well):
We want to graph the number of active logged in users on the website.
Maintaining that state across application servers to manually update gauges is non-trivial.
We send a message to statsd containing the id of the user making a request.
Continous Integration
- Jenkins, Circle CI, TravisCI
- Chat bot in Campfire
Continous delivery
Configuration Mgmtm
- redis
- chef nodejs
Fleet: Extending the easy Git -> deploy
Initial blogpost on fleet: http://blog.nodejs.org/2012/05/02/multi-server-continuous-deployment-with-fleet/
Fleet - uses drones & propagit: https://github.com/substack/fleet
Propagit: A cascading git deployment: https://github.com/substack/propagit
blogpost on fleet usage: http://opsite.wordpress.com/2013/05/04/automated-drone-management-system-for-node-js-fleet/
Flotilla: https://npmjs.org/package/flotilla
All nodejs fleet modules: https://nodejsmodules.org/new/tags/fleet
Not related, but also cool: EC2-fleet https://github.com/ashtuchkin/ec2-fleet
- passport.js
- csrf in express http://www.senchalabs.org/connect/middleware-csrf.html
- helmet in express - https://github.com/evilpacket/helmet
- use bcrypt password - http://codahale.com/how-to-safely-store-a-password/
- link sessions socketio/express - https://github.com/camarao/session.socket.io
Loadbalancer/SSL Termination
proxy in express
in socketIO proxy
SSL offload via HAProxy 1.5dev (also websockets) brew install haproxy --devel
remove header express version
CA options is an array
SSL correct settings
Perfect secrecy
offload your SSL
CA param is an array (add provider Certs)
strictCipher, SSL attacks
perfect secrecy
SSL insecure!
express proxy setting (X-...
in socket.io (authentication Secure..) , Proxy
// req.ip , req.ips , req.protocol (http,https) if (settings.terminated) { expressApp.enable('trust proxy'); }
Security stuff
- input/output checker
http://www.slideshare.net/BishanSingh/node-security-the-good-bad-ugly console.log(with beep char); JSON.parse
Preinstall in npm http://www.slideshare.net/ASF-WS/asfws-2012-nodejs-security-old-vulnerabilities-in-new-dresses-par-sven-vetsch https://www.google.be/url?sa=t&rct=j&q=&esrc=s&source=web&cd=14&ved=0CEQQFjADOAo&url=http%3A%2F%2Flab.cs.ttu.ee%2Fdl93&ei=WEndUYe4C8PKhAeztIHIDw&usg=AFQjCNE0MMCy8ZYdpi5O0gzr-2Qy5e2phg&sig2=6lFYPQ-FtKzKGHYoDcnL9Q
The request size is also not limited by Node.js which means that a large POST request can be sent to fill the whole memory.
Mostly NPM is ran with root privileges. Use objectProperties that cannot be changed String sanitizer - https://github.com/chriso/node-validator
Fusker - fight back - https://github.com/wearefractal/fusker
https://github.com/revington/connect-bruteforce https://github.com/dharmafly/connect-ratelimit
- secure, http-only, signed cookies
- csrf attack connect.csrf
- helmet other security headers
- RedisStore backed
- connect-sessionIO
- redisstore with hiredis (reuse connection)
#CLI stuff
- commander
- shell.js
- ssh2