/nodejs-ops

my notes on operational things in nodejs

About this presentation

  • Started looking at nodejs in anger about 1,5 month ago.
  • Compiled a list of topics outside the usual (dev focused) tutorials
  • Not all of this is battletested in production
  • Beginners, there should be enough for you
  • Advanced, Gurus , please chime in, I'd love your feedback!

Things you should avoid (period)

avoid typos, missing commas, etc..

avoid using global space: (use var where you can)

  • use var a = 'bla' ;

avoid using eval, with, switch: (without defaults)

  • use 'use strict'; or : $ node --use_strict

http://bishankochher.blogspot.fr/2012/02/nodejs-with-is-evil.html

don't run npm as root:

  • npm pre-install in package.json can be evil

don't run as root:

Testing

use a grunt setup - example

Versioning & Packaging

  • use semver versioning in package.json

Dependencies in package.json

Packaging

  • npm pack
  • bashpack (turns a nodejs process in bash script)

Error handling

http://snmaynard.com/2012/12/21/node-error-handling/

Try/catch , but ...

  • Try/catch for errors, but this will not catch async errors

http://www.javascriptkit.com/javatutors/trycatch2.shtml

Return an Error object and not a string

Better stacktraces

Note that I recommend using function Foo() { ... } for constructors instead of var Foo = function() { ... } http://book.mixu.net/ch6.html

// Constructor
function Foo(bar) { 
  // always initialize all instance properties
  this.bar = bar;
  this.baz = 'baz'; // default value
}
// class methods
Foo.prototype.fooBar = function() {

};
// export the class
module.exports = Foo;

Listen for ALL errors

  • emit('error') makes the process exit if no listener:

  • listen to http, redis connections , express, socketIO, log handler, metrics handler

  • return a callback on error (as otherwise it will continue)

Use domains to contain errors

process.uncaughtException

Not sure if this still makes sense with domains

return on a callback

function(err,callback) {
  if (err) { return callback('err'); }
  console.log('this will be printed too')
};

but: in loops or others returns means something else

callback(err) vs emit('error')

Not sure yet here, both? only if callback, only if error listener?

Logging

I like https://github.com/flatiron/winston/

  • Has the logger.log, logger.info, logger.debug etc.

  • Can log metadata, or json objects

  • Can use multiple tags : https://github.com/jedi4ever/socialapp/blob/master/lib/utils/logger.js

  • Create as a singleton, require and reuse in other modules (module cache will reuse same object)

    var logger = function (options) {

    // If we have already been initialized if (sharedLogger) { return sharedLogger; }

var logger = require('../utils/logger')().loggers.get('express');

Multiple Outputs

Use in express

// enable web server logging; pipe those log messages through winston
// http://stackoverflow.com/questions/9141358/how-do-i-output-connect-expresss-logger-output-to-winston
var winstonStream = {
  write: function(message, encoding){
    logger.info(message.slice(0,-1)); //remove newline
  }
};
expressApp.use(express.logger({stream: winstonStream}));

Use in socketio

var ioServer = socketIO.listen(webServer, { logger: logger , log:true});

Clustering

Keep it up

The basics

Have multiple nodejs processes listen on the same socket.

The trick? pass the socket/file descriptor from a parent process and have the server.listen reuse that descriptor. So multiprocess in their own memory space (but with ENV shared usually)

It does not balance, it leaves it to the kernel.

In the last nodejs > 0.8 there is a cluster module (functional although marked experimental)

Note: not yet found a 100% reason to favor multi process/cluster nodejs over nginx/haproxy stuff:

Current Cluster/Domain Enhancements:

The core included module is basic, it tells you to take care of all the stuff you need in zerodowntime environments. Most of the tools below allow you to:

  • wait for workers to correctly close or sigKill if past timeout
  • sigUSR2 to reload workers one by one
  • some provide a cli to do the work using a socket/network connection
  • kill a worker that has become unresponsive by waiting for a heartbeat
  • put itself offline and not accepting any new requests

Recluster: https://github.com/doxout/recluster/

Cluster2: https://github.com/ql-io/cluster2

  • Framework in use at ebay:

  • Good: complete with monitoring, control URL

  • Bad: Seems to be massive ...

Naught: https://github.com/superjoe30/naught

Note: naught2 was a temporary fork, but it got merged into master again

talks about Zero Downtime Crashed by intelligently handling express errors with domains

cluster-master : https://github.com/isaacs/cluster-master

  • Build by nodejs god @isaacs
  • To be investigated

Up: https://github.com/LearnBoost/up

Others to check:

Older solutions:

(+2 years no updated & probably not nodejs > 0.8 compliant) So you can safely ignore these, but they can give inspiration

Worth mentioning:

Profiling

to be investigated

Profiling tools

Connect

Dtrace

Memory

'Known' Limits/Tuning

Metrics

Use statsd backend to send counters, timers etc..

https://github.com/sivy/node-statsd/

  • reuse UDP connection: yes

  • prefix: yes

  • suffix: yes

  • dnscache: yes

  • mock: yes

  • samplerate: yes

  • errors: yes (eventemitter & bubbleup)

  • timing: yes

  • count: yes

  • increment: yes

  • decrement: yes

  • gauge: yes

  • set: yes

  • batch: yes

  • callback: yes

https://github.com/dscape/lynx

  • reuse UDP connection: yes (+ provide your own)

  • prefix: yes (called scope)

  • error: provide an error function

  • network : USE of ephemeral sockets!

  • samplerate: yes (use special random)

  • batching yes:

  • increment: yes

  • decrement: yes

  • timing: yes

  • gauge: yes

  • set: yes

special:

  • uses as stream in/out: yes (uses parser)

https://github.com/msiebuhr/node-statsd-client

  • reuse UDP connections: yes (ephemeral socket)

  • prefix: yes

  • count: yes

  • gauges: yes

  • increment: yes

  • decrement: yes

  • sets: yes

  • timings: yes (delays)

special

  • socketTimeout: yes
  • children (multi prefix)
  • express helper: yes (or per URL)

https://github.com/Singly/statsd-singly

  • reuse UDP: no

  • callback: yes (but default prints to console)

  • samplerate: yes

  • prefix: no

  • suffix: yes

  • batch: yes

  • count: yes

  • gauges: yes

  • increment: yes

  • decrement: yes

  • sets: yes (called modify)

  • timings: (delays)

https://github.com/fasterize/node-statsd-profiler

(fork from node-statsd)

  • samplerate: yes

  • timing: yes

  • count: yes

  • increment: yes

  • decrement: yes

  • gauge: yes

  • set: ??

  • timingstart: yes

  • timignend: yes

special:

  • introduces: key aliases
  • transformKey function: YES

https://github.com/spreaker/nodejs-statsd-client

Note: you can safely ignore this lib

  • reuse UDP connection: no
  • timing: yes
  • count: yes
  • increment: yes
  • decrement: yes
  • gauge: yes
  • errors: total ignore
  • prefix: no
  • samplerate: no
  • callback: no

https://github.com/godmodelabs/statistik

Note: use node-statsd instead unless cli is something special for you. The feature set is smaller than node-statsd and no special features, so we'll ignore this

  • udp connection reuse: no
  • timing: yes
  • counter: yes
  • increment: yes
  • decrement: yes
  • gauge: yes
  • rawSend: yes
  • samplerate: yes
  • batch: no

Mixing in express & connect:

Others:

Note: I've not included any specific backends here, we're focusing on generic statsd usage

What is a set in statsd?

Sets are acting like simple counters, with the additional specificity that it ignores duplicate values.

Technically, all values are stored in a set, and the number of elements in the set is sent to graphite during flushes. Sets are also emptied during flushes (in the same way that counters are reset to 0).

We have been using it in production for a while now, and it is working as expected. The use case that was used during the development of that feature was the following (it has been since extended to other cases as well):

We want to graph the number of active logged in users on the website.
Maintaining that state across application servers to manually update gauges is non-trivial.
We send a message to statsd containing the id of the user making a request.

Continous Integration

  • Jenkins, Circle CI, TravisCI
  • Chat bot in Campfire

Continous delivery

Configuration Mgmtm

  • redis
  • chef nodejs

Fleet: Extending the easy Git -> deploy

Not related, but also cool: EC2-fleet https://github.com/ashtuchkin/ec2-fleet

Authz/Authn

Loadbalancer/SSL Termination

  • http://www.ericmartindale.com/2012/07/19/mitigating-the-beast-tls-attack-in-nodejs/

  • proxy in express

  • in socketIO proxy

  • HonorCiphers

  • SSL offload via HAProxy 1.5dev (also websockets) brew install haproxy --devel

  • remove header express version

  • Oauth

  • CA options is an array

  • SSL correct settings

  • Perfect secrecy

  • process.env.NODE_TLS_REJECT_UNAUTHORIZED = '0';

  • offload your SSL

  • CA param is an array (add provider Certs)

  • strictCipher, SSL attacks

  • perfect secrecy

  • SSL insecure!

  • express proxy setting (X-...

  • in socket.io (authentication Secure..) , Proxy

    // req.ip , req.ips , req.protocol (http,https) if (settings.terminated) { expressApp.enable('trust proxy'); }

Security stuff

Various

  • input/output checker

http://www.slideshare.net/BishanSingh/node-security-the-good-bad-ugly console.log(with beep char); JSON.parse

Mostly NPM is ran with root privileges. Use objectProperties that cannot be changed String sanitizer - https://github.com/chriso/node-validator

Fusker - fight back - https://github.com/wearefractal/fusker

https://github.com/revington/connect-bruteforce https://github.com/dharmafly/connect-ratelimit

http://stackoverflow.com/questions/14991963/socket-io-server-throttling-a-fast-client

Cookie/Sessions

  • secure, http-only, signed cookies
  • csrf attack connect.csrf
  • helmet other security headers
  • RedisStore backed
  • connect-sessionIO
  • redisstore with hiredis (reuse connection)

#CLI stuff

  • commander
  • shell.js
  • ssh2