Yelp/zygote

messages from dead workers

Closed this issue · 2 comments

From master:

    elif msg_type is message.MessageHTTPBegin:
        # a worker started servicing an HTTP request
        worker = self.zygote_collection.get_worker(msg.pid)
        worker.start_request(msg.remote_ip, msg.http_line)
    elif msg_type is message.MessageHTTPEnd:
        # a worker finished servicing an HTTP request
        worker = self.zygote_collection.get_worker(msg.pid)
        worker.end_request()
        if self.max_requests is not None and worker.request_count >= self.max_requests:
            log.info('child %d reached max_requests %d, killing it', worker.pid, self.max_requests)
            os.kill(worker.pid, signal.SIGQUIT)

If the worker dies after sending the message we'll be calling methods on NoneType objects.

Different types of traces I got forcing this situation:

Traceback (most recent call last):
File "zygote/util.py", line 278, in wrapped_handler
handler(_args, *_kwargs)
File "zygote/master.py", line 249, in recv_protocol_msg
worker.start_request(msg.remote_ip, msg.http_line)
AttributeError: 'NoneType' object has no attribute 'start_request'

Traceback (most recent call last):
File "zygote/util.py", line 278, in wrapped_handler
handler(_args, *_kwargs)
File "zygote/master.py", line 253, in recv_protocol_msg
worker.end_request()
AttributeError: 'NoneType' object has no attribute 'end_request'

Traceback (most recent call last):
File "/home/bmetin/repos/tornado_env/lib/python2.6/site-packages/tornado/ioloop.py", line 421, in _run_callback
callback()
File "zygote/master.py", line 295, in transition_idle_workers
self.kill_zygote(z)
File "zygote/master.py", line 303, in kill_zygote
os.kill(zygote.pid, signal.SIGQUIT)
OSError: [Errno 3] No such process

Traceback (most recent call last):
File "zygote/util.py", line 278, in wrapped_handler
handler(_args, *_kwargs)
File "zygote/master.py", line 217, in recv_protocol_msg
self.zygote_collection[msg.worker_ppid].add_worker(msg.pid, msg.time_created)
File "zygote/accounting.py", line 215, in getitem
return self.zygote_map[pid]
KeyError: 29531

fixed with #44