Healthcheck plugin for nginx. It polls backends and if they respond with HTTP 200 + an optional request body, they are marked good. Otherwise, they are marked bad. Similar to haproxy/varnish health checks. For help on all the options, see the docblocks inside the .c file where each option is defined. Note this also gives you access to a health status page that lets you see how well your healthcheck are doing. ==Important= Nginx gives you full freedom which server peer to pick when you write an upstream. This means that the healthchecking plugin is only a tool that other upstreams must know about to use. So your upstream code MUST SUPPORT HEALTHCHECKS. It's actually pretty easy to modify the code to support them. See the .h file for how as well as the upstream_hash patch which shows how to modify upstream_hash to support healthchecks. For an example plugin modified to support healthchecks, see my modifications to the upstream_hash plugin here: http://github.com/cep21/nginx_upstream_hash/tree/support_http_healthchecks ==Limitations== The module only supports HTTP 1.0, not 1.1. What that really means is it doesn't understand chunked encoding. You should ask for a 1.0 reponse with your healthcheck, unless you're sure the upstream won't send back chunked encoding. See the sample config for an example. ==INSTALL== # Similar to the upstream_hash module cd nginx-0.7.62 # or whatever patch -p1 < /path/to/this/directory/nginx.patch ./configure --add-module=/path/to/this/directory make make install ==How the module works== My first attempt was to spawn a pthread inside the master process, but nginx freaks out on all kinds of levels when you try to have multiple threads running at the same time. Then I thought, fine I'll just fork my own child. But that caused lots of issues when I tried to HUP the master process because my own child wasn't getting signals. I was thinking to myself, these just don't feel like the nginx way of doing things. So, I figured I would just work directly with the worker process model. When each worker process starts, they add an repeating event to the event tree asking for ownership of a server's healthcheck. When that ownership event comes up, they lock the server's healthcheck and try to claim it with their pid. If the process can't claim it, then it retries to claim the healthcheck later, cause maybe the worker that does own it dies or something. For the worker that does own it, it inserts a healthcheck event into nginx's event tree. When that triggers, then it starts a peer connection to the server and goes to town sending and getting data. When the healthcheck finishes, or times out, it updates the shared memory structure and signals for another healthcheck later. A few random issues I had were: 1) When nginx tries to shut down, it waits for the event tree to empty out. To get around this, I check for ngx_quit and all kinds of other variables. This means that when you do HUP nginx, your worker needs to sit around doing nothing until *something* in the healthcheck event tree comes up, after which it can clear all the healthcheck events and move on. I could fix this if nginx added a per module callback on HUP. Maybe a 'cleanup' or something. The current exit_process callback is called after the event tree is empty, not after a request to shutdown a worker. ==Extending== It should be very easy to extend this module to work with fastcgi or even generic TCP backends. You would need to just change, or abstract out, ngx_http_healthcheck_process_recv. Patches that do that are welcome, and I'm happy to help out with any questions. I'm also happy to help out with extending your upstream picking modules to work with healthchecks as well. Your code can even be no healthcheck compatable by surrounding the changes with #if (NGX_HTTP_HEALTHCHECK) ==Config== See sample_ngx_config.conf Author: Jack Lindamood <jack@facebook.com> ==License== Apache License, Version 2.0