Let it die

long running processes in nodejs

Created by Christian Ranz

common problems

  • memory consumption rises
  • db connections die
  • application walks into a dead end
  • servers shut down
  • ...

The solution

There is not just a single tool to use. We need to …

  • … design our application the right way.
  • … keep your data structures as simple as possible.
  • … a good and robust monitoring.

Application design

Example: a simple daemon

structure of simple queue daemon

Recovery

There should be no data loss after an uncontrolled shutdown.

Recovery

structure of simple queue daemon with recovery

What about controlled kills & shutdowns

  • SIGINT
  • SIGTERM

Logging

  • logging has to be robust
  • you don't like to loose logs in any error case

When you decide about the way you are logging, think about, having a save way in cases like uncaught exceptions.

console.log

  • all information goes to stdout
  • no controll over levels

Custom logger


logger = module.exports;
logger.level = 'debug';
logger.levelMap = {'emerg': 0, 'error': 1, 'warning': 2, 
    'info': 3, 'debug': 4};
logger.log = function(level, message) {
    if(logger.levelMap[level] > logger.levelMap[logger.level]) {
        return;
    }
    if(level === 'emerg') {
        process.stderr.write(level + ': ' + message + '\n');
        process.exit(1);
    } else {
        process.stdout.write(level + ': ' + message + '\n');
    }
};
                    

e.g. Winston

Winston is a multi-transport, asynchronous logging library for node.js


var winston = require('winston');
require('winston-mongo').Mongo;
var logger = new (winston.Logger)({
  transports: [
    new winston.transports.Console()
    new winston.transports.File({ filename: 'path/to/logs.log' })
    new winston.transports.MongoDB({ db: 'logging', level: 'info'}) // should be capped
  ]
  exceptionHandlers: [
    new winston.transports.File({ filename: 'path/to/exceptions.log' })
  ]
});
                    

Data structures

  • Kepp data structures as simple as possible.
  • Think about transactions or similar.
  • do rollbacks in error case to maintain proper state

Monitoring and demonization

Upstart

  • Runs nodejs as a daemon.
  • Provides a simple set of commands like start/stop.

Upstart conf


#!upstart
description "node.js daemon"
author      "chris"

start on startup
stop on shutdown

script
    export HOME="/root"

    echo $$ > /var/run/node-daemon.pid
    exec sudo -u vagrant node /vagrant/node/qeuedaemon-monit.js >> /var/log/node-daemon.log 2>&1
end script

pre-start script
    # Date format same as (new Date()).toISOString() for consistency
    echo "[`date -u +%Y-%m-%dT%T.%3NZ`] (sys) Starting" >> /var/log/node-daemon.log
end script

pre-stop script
    rm /var/run/node-daemon.pid
    echo "[`date -u +%Y-%m-%dT%T.%3NZ`] (sys) Stopping" >> /var/log/node-daemon.log
end script
                    

Monit

  • A set of tests evaluated in configurable intervals.
  • Monitor (almost) everything.

What do we monitor

  • The node process itself
  • Cpu and memory usage
  • Process uptime
  • Process health

Node process


check process node
    with pidfile "/var/run/node-daemon.pid"
    start program = "/sbin/start node-daemon"
    stop program  = "/sbin/stop node-daemon"
    if changed pid
        then restart
                    

Cpu and memory


check process node
    with pidfile "/var/run/node-daemon.pid"
    start program = "/sbin/start node-daemon"
    stop program  = "/sbin/stop node-daemon"
    if changed pid
        then restart
    if totalmemory > 2% for 5 cycles
        then alert
    if totalmemory > 10% for 5 cycles
        then restart
                    

Process uptime


check file node_forcekill path "/var/run/node-daemon.pid" every 10 cycles
    start program = "/sbin/start node-daemon"
    stop program  = "/sbin/stop node-daemon"
    if timestamp > 1 days
        then restart
                    

Process health


check file node_alive path "/var/log/node-daemon-alive.log"
    start program = "/sbin/start node-daemon"
    stop program  = "/sbin/stop node-daemon"
    if timestamp > 30 seconds
        then exec "/bin/bash -c 'kill -s SIGKILL `cat /var/run/node-daemon.pid`'"
                    
  • write good quality code
  • write and run tests
  • get to know your application

Initial state is the best state.

so let it die...

Resources

Thanks for listening

Questions?