Gearman server crashes if a WORK_EXCEPTION is delivered after a WORK_COMPLETE
passcod opened this issue · 8 comments
Gearmand 1.1.18
I can reproduce and provide logs but am away at the moment, filing from memory.
If you grab a job, then issue a WORK_COMPLETE
, and then issue a WORK_EXCEPTION
and WORK_FAIL
, the entire server crashes.
Really, this is then a worker bug, as that sequence of events shouldn't happen, but it seems unfortunate that one misbehaving worker can crash the server.
I agree that that shouldn't crash the server. A worker which does that would be helpful in fixing the problem.
Any particular language/binding is easiest for you? I discovered this using abraxas (node js) but I can do whatever.
Well, it would probably be best if it could ultimately end up in the test suite, so C or C++? Maybe there's a test in the test suite that you could adapt? If so, that would be amazing!
Okay, I'll have a go
Okay, correction, it's a WORK_EXCEPTION
after a WORK_COMPLETE
. Repro incoming
Okay, I tried to write C but gave up. Instead, I wrote bash. Hopefully it's relatively portable bash (so long as you have hexdump
and bash >= 5):
gearcrash.sh
#!/usr/bin/env bash
set -euo pipefail
exec 3<>/dev/tcp/127.0.0.1/4730
# Set client ID -- essentially to test if the binary connection is working
echo -ne '\0REQ\0\0\0\x16\0\0\0\x05Hello' >&3
# Worker: CANDO crashtest
echo -ne '\0REQ\0\0\0\x01\0\0\0\x09crashtest' >&3
# Worker: PRE_SLEEP
echo -ne '\0REQ\0\0\0\x04\0\0\0\0' >&3
sleep 1
echo Submitting job
# Client: SUBMIT_JOB crashtest unique workload
echo -ne '\0REQ\0\0\0\x07\0\0\0\x19crashtest\0unique\0workload' >&3
handle=$(timeout 0.2 cat <&3 || true)
if [[ -z "$handle" ]]; then
echo "gearman didn't give us a handle"
exit 1
fi
# Clean off the binary gunk
handle=$(grep -aoP '(H:[\w-]+:\d+)' <<< "$handle")
sleep 1
echo Grabbing job
# Worker: GRAB_JOB
echo -ne '\0REQ\0\0\0\x09\0\0\0\0' >&3
sleep 1
# We're going to ignore anything the server sends
# and just assume we got the job we created above
set +e
data=$(timeout 1 cat <&3 || true)
set -e
echo "Data from server:"
hexdump -C <<< "$data"
sleep 3
echo "Crash sequence now"
# handle + 2 bytes (nul and single-byte data)
donelen=$(echo -ne "$handle\0." | hexdump | tail -n1 | cut -c6-)
# Worker: WORK_COMPLETE
echo -ne "\0REQ\0\0\0\x0D\0\0\0\x${donelen}${handle}\0." >&3
# Worker: WORK_EXCEPTION
echo -ne "\0REQ\0\0\0\x19\0\0\0\x${donelen}${handle}\0." >&3
set +e
data=$(timeout 1 cat <&3 || true)
set -e
echo "Data from server:"
hexdump -C <<< "$data"
echo "Gearman should have crashed"
Output from running it
Submitting job
./gearcrash.sh: line 22: warning: command substitution: ignored null byte in input
Grabbing job
./gearcrash.sh: line 42: warning: command substitution: ignored null byte in input
Data from server:
00000000 52 45 53 0b 20 48 3a 6b 61 79 64 65 6c 2d 6b 6f |RES. H:kaydel-ko|
00000010 3a 31 63 72 61 73 68 74 65 73 74 77 6f 72 6b 6c |:1crashtestworkl|
00000020 6f 61 64 0a |oad.|
00000024
Crash sequence now
./gearcrash.sh: line 62: warning: command substitution: ignored null byte in input
Data from server:
00000000 52 45 53 0d 0f 48 3a 6b 61 79 64 65 6c 2d 6b 6f |RES..H:kaydel-ko|
00000010 3a 31 2e 0a |:1..|
00000014
Gearman should have crashed
Output from gearmand
~$> gearmand --verbose DEBUG
--syslog
--log-file
stderr
DEBUG 2020-12-09 21:13:59.392010 [ main ] THREADS: 4 -> libgearman-server/gearmand.cc:263
INFO 2020-12-09 21:13:59.392099 [ main ] Initializing Gear on port 4730 with SSL: false
INFO 2020-12-09 21:13:59.392180 [ main ] Starting up with pid 2136195, verbose is set to DEBUG
DEBUG 2020-12-09 21:13:59.392237 [ main ] Method for libevent: epoll -> libgearman-server/gearmand.cc:364
DEBUG 2020-12-09 21:13:59.392230 [ main ] staring up Epoch thread -> libgearman-server/timer.cc:61
DEBUG 2020-12-09 21:13:59.392394 [ main ] Trying to listen on 0.0.0.0:4730 -> libgearman-server/gearmand.cc:646
INFO 2020-12-09 21:13:59.392430 [ main ] Listening on 0.0.0.0:4730 (9)
DEBUG 2020-12-09 21:13:59.392443 [ main ] Trying to listen on :::4730 -> libgearman-server/gearmand.cc:646
INFO 2020-12-09 21:13:59.392471 [ main ] Listening on :::4730 (10)
DEBUG 2020-12-09 21:13:59.392490 [ main ] Creating wakeup pipe -> libgearman-server/gearmand.cc:915
DEBUG 2020-12-09 21:13:59.392516 [ main ] Creating 4 threads -> libgearman-server/gearmand.cc:378
DEBUG 2020-12-09 21:13:59.392534 [ main ] Initializing libevent for IO thread -> libgearman-server/gearmand_thread.cc:224
DEBUG 2020-12-09 21:13:59.392572 [ main ] Creating IO thread wakeup pipe -> libgearman-server/gearmand_thread.cc:495
DEBUG 2020-12-09 21:13:59.392646 [ main ] Thread 1 created -> libgearman-server/gearmand_thread.cc:273
DEBUG 2020-12-09 21:13:59.392694 [ 1 ] Entering thread event loop -> libgearman-server/gearmand_thread.cc:463
DEBUG 2020-12-09 21:13:59.392730 [ main ] Initializing libevent for IO thread -> libgearman-server/gearmand_thread.cc:224
DEBUG 2020-12-09 21:13:59.392779 [ main ] Creating IO thread wakeup pipe -> libgearman-server/gearmand_thread.cc:495
DEBUG 2020-12-09 21:13:59.392846 [ main ] Thread 2 created -> libgearman-server/gearmand_thread.cc:273
DEBUG 2020-12-09 21:13:59.392869 [ main ] Initializing libevent for IO thread -> libgearman-server/gearmand_thread.cc:224
DEBUG 2020-12-09 21:13:59.392895 [ 2 ] Entering thread event loop -> libgearman-server/gearmand_thread.cc:463
DEBUG 2020-12-09 21:13:59.392947 [ main ] Creating IO thread wakeup pipe -> libgearman-server/gearmand_thread.cc:495
DEBUG 2020-12-09 21:13:59.393024 [ main ] Thread 3 created -> libgearman-server/gearmand_thread.cc:273
DEBUG 2020-12-09 21:13:59.393050 [ main ] Initializing libevent for IO thread -> libgearman-server/gearmand_thread.cc:224
DEBUG 2020-12-09 21:13:59.393068 [ 3 ] Entering thread event loop -> libgearman-server/gearmand_thread.cc:463
DEBUG 2020-12-09 21:13:59.393097 [ main ] Creating IO thread wakeup pipe -> libgearman-server/gearmand_thread.cc:495
DEBUG 2020-12-09 21:13:59.393162 [ main ] Thread 4 created -> libgearman-server/gearmand_thread.cc:273
DEBUG 2020-12-09 21:13:59.393177 [ main ] replaying queue: begin -> libgearman-server/gearmand.cc:391
DEBUG 2020-12-09 21:13:59.393188 [ main ] __replay -> libgearman-server/plugins/queue/default/queue.cc:101
DEBUG 2020-12-09 21:13:59.393191 [ 4 ] Entering thread event loop -> libgearman-server/gearmand_thread.cc:463
DEBUG 2020-12-09 21:13:59.393199 [ main ] replaying queue: end -> libgearman-server/gearmand.cc:397
INFO 2020-12-09 21:13:59.393214 [ main ] Adding event for listening socket (9)
INFO 2020-12-09 21:13:59.393232 [ main ] Adding event for listening socket (10)
DEBUG 2020-12-09 21:13:59.393250 [ main ] Adding event for wakeup pipe -> libgearman-server/gearmand.cc:966
DEBUG 2020-12-09 21:13:59.393267 [ main ] Entering main event loop -> libgearman-server/gearmand.cc:406
DEBUG 2020-12-09 21:14:01.548646 [ main ] accept() fd:33 -> libgearman-server/gearmand.cc:874
INFO 2020-12-09 21:14:01.548689 [ main ] Accepted connection from 127.0.0.1:36710
DEBUG 2020-12-09 21:14:01.548735 [ 4 ] Received CON wakeup event -> libgearman-server/gearmand_thread.cc:619
DEBUG 2020-12-09 21:14:01.548798 [ 4 ] 127.0.0.1:36710 Watching POLLIN -> libgearman-server/gearmand_thread.cc:151
DEBUG 2020-12-09 21:14:01.548832 [ 4 ] Gear connection made: 127.0.0.1:36710 -> libgearman-server/plugins/protocol/gear/protocol.cc:407
DEBUG 2020-12-09 21:14:01.548859 [ 4 ] 127.0.0.1:36710 Ready POLLIN -> libgearman-server/gearmand_con.cc:138
DEBUG 2020-12-09 21:14:01.548889 [ 4 ] read 50 bytes -> libgearman-server/io.cc:798
DEBUG 2020-12-09 21:14:01.548921 [ 4 ] Gear unpack -> libgearman-server/plugins/protocol/gear/protocol.cc:117
DEBUG 2020-12-09 21:14:01.548939 [ 4 ] Received GEARMAN_SET_CLIENT_ID -> libgearman-server/thread.cc:311
DEBUG 2020-12-09 21:14:01.548966 [ 4 ] Gear unpack -> libgearman-server/plugins/protocol/gear/protocol.cc:117
DEBUG 2020-12-09 21:14:01.548987 [ 4 ] Received GEARMAN_CAN_DO -> libgearman-server/thread.cc:311
DEBUG 2020-12-09 21:14:01.548985 [ proc ] PACKET COMMAND: GEARMAN_SET_CLIENT_ID -> libgearman-server/server.cc:119
DEBUG 2020-12-09 21:14:01.549005 [ 4 ] Gear unpack -> libgearman-server/plugins/protocol/gear/protocol.cc:117
DEBUG 2020-12-09 21:14:01.549042 [ 4 ] Received GEARMAN_PRE_SLEEP -> libgearman-server/thread.cc:311
DEBUG 2020-12-09 21:14:01.549055 [ proc ] identifier set to Hello -> libgearman-server/connection.cc:335
DEBUG 2020-12-09 21:14:01.549064 [ 4 ] 127.0.0.1:36710 Watching POLLIN -> libgearman-server/gearmand_thread.cc:151
DEBUG 2020-12-09 21:14:01.549075 [ proc ] PACKET COMMAND: GEARMAN_CAN_DO -> libgearman-server/server.cc:119
DEBUG 2020-12-09 21:14:01.549093 [ proc ] Registering function: crashtest -> libgearman-server/server.cc:522
DEBUG 2020-12-09 21:14:01.549110 [ proc ] PACKET COMMAND: GEARMAN_PRE_SLEEP -> libgearman-server/server.cc:119
DEBUG 2020-12-09 21:14:02.550690 [ 4 ] 127.0.0.1:36710 Ready POLLIN -> libgearman-server/gearmand_con.cc:138
DEBUG 2020-12-09 21:14:02.550739 [ 4 ] read 37 bytes -> libgearman-server/io.cc:798
DEBUG 2020-12-09 21:14:02.550760 [ 4 ] Gear unpack -> libgearman-server/plugins/protocol/gear/protocol.cc:117
DEBUG 2020-12-09 21:14:02.550774 [ 4 ] Received GEARMAN_SUBMIT_JOB -> libgearman-server/thread.cc:311
DEBUG 2020-12-09 21:14:02.550795 [ 4 ] 127.0.0.1:36710 Watching POLLIN -> libgearman-server/gearmand_thread.cc:151
DEBUG 2020-12-09 21:14:02.550801 [ proc ] PACKET COMMAND: GEARMAN_SUBMIT_JOB -> libgearman-server/server.cc:119
DEBUG 2020-12-09 21:14:02.550843 [ proc ] Received submission, function:crashtest unique:unique with 2 arguments -> libgearman-server/server.cc:248
DEBUG 2020-12-09 21:14:02.550954 [ proc ] Comparing queue 0 to limit 0 for priority 1 -> libgearman-server/job.cc:175
DEBUG 2020-12-09 21:14:02.550969 [ proc ] JOB H:kaydel-ko:1 :3906686023 -> libgearman-server/job.cc:240
NOTICE 2020-12-09 21:14:02.551010 [ proc ] accepted,crashtest,unique,0 -> libgearman-server/server.cc:317
DEBUG 2020-12-09 21:14:02.551018 [ 4 ] Received RUN wakeup event -> libgearman-server/gearmand_thread.cc:624
DEBUG 2020-12-09 21:14:02.551044 [ 4 ] Sent NOOP -> libgearman-server/thread.cc:356
DEBUG 2020-12-09 21:14:02.551087 [ 4 ] send() 37 bytes to peer -> libgearman-server/io.cc:403
DEBUG 2020-12-09 21:14:02.551102 [ 4 ] Sent JOB_CREATED -> libgearman-server/thread.cc:356
DEBUG 2020-12-09 21:14:03.761100 [ 4 ] 127.0.0.1:36710 Ready POLLIN -> libgearman-server/gearmand_con.cc:138
DEBUG 2020-12-09 21:14:03.761157 [ 4 ] read 12 bytes -> libgearman-server/io.cc:798
DEBUG 2020-12-09 21:14:03.761179 [ 4 ] Gear unpack -> libgearman-server/plugins/protocol/gear/protocol.cc:117
DEBUG 2020-12-09 21:14:03.761198 [ 4 ] Received GEARMAN_GRAB_JOB -> libgearman-server/thread.cc:311
DEBUG 2020-12-09 21:14:03.761223 [ 4 ] 127.0.0.1:36710 Watching POLLIN -> libgearman-server/gearmand_thread.cc:151
DEBUG 2020-12-09 21:14:03.761264 [ proc ] PACKET COMMAND: GEARMAN_GRAB_JOB -> libgearman-server/server.cc:119
DEBUG 2020-12-09 21:14:03.761295 [ proc ] Jobs available for crashtest: 1 -> libgearman-server/gearmand_con.cc:366
DEBUG 2020-12-09 21:14:03.761308 [ proc ] Sending GEARMAN_COMMAND_JOB_ASSIGN Function: crashtest(9) with data sized (8) -> libgearman-server/server.cc:663
DEBUG 2020-12-09 21:14:03.761349 [ 4 ] Received RUN wakeup event -> libgearman-server/gearmand_thread.cc:624
DEBUG 2020-12-09 21:14:03.761413 [ 4 ] send() 44 bytes to peer -> libgearman-server/io.cc:403
DEBUG 2020-12-09 21:14:03.761433 [ 4 ] Sent JOB_ASSIGN -> libgearman-server/thread.cc:356
DEBUG 2020-12-09 21:14:08.777932 [ 4 ] 127.0.0.1:36710 Ready POLLIN -> libgearman-server/gearmand_con.cc:138
DEBUG 2020-12-09 21:14:08.777989 [ 4 ] read 27 bytes -> libgearman-server/io.cc:798
DEBUG 2020-12-09 21:14:08.778003 [ 4 ] Gear unpack -> libgearman-server/plugins/protocol/gear/protocol.cc:117
DEBUG 2020-12-09 21:14:08.778014 [ 4 ] Received GEARMAN_WORK_COMPLETE -> libgearman-server/thread.cc:311
DEBUG 2020-12-09 21:14:08.778049 [ 4 ] read 27 bytes -> libgearman-server/io.cc:798
DEBUG 2020-12-09 21:14:08.778064 [ 4 ] Gear unpack -> libgearman-server/plugins/protocol/gear/protocol.cc:117
DEBUG 2020-12-09 21:14:08.778079 [ 4 ] Received GEARMAN_WORK_EXCEPTION -> libgearman-server/thread.cc:311
DEBUG 2020-12-09 21:14:08.778099 [ 4 ] 127.0.0.1:36710 Watching POLLIN -> libgearman-server/gearmand_thread.cc:151
DEBUG 2020-12-09 21:14:08.778096 [ proc ] PACKET COMMAND: GEARMAN_WORK_COMPLETE -> libgearman-server/server.cc:119
DEBUG 2020-12-09 21:14:08.778150 [ proc ] PACKET COMMAND: GEARMAN_WORK_EXCEPTION -> libgearman-server/server.cc:119
DEBUG 2020-12-09 21:14:08.778157 [ 4 ] Received RUN wakeup event -> libgearman-server/gearmand_thread.cc:624
DEBUG 2020-12-09 21:14:08.778213 [ 4 ] send() 27 bytes to peer -> libgearman-server/io.cc:403
DEBUG 2020-12-09 21:14:08.778235 [ 4 ] Sent WORK_COMPLETE -> libgearman-server/thread.cc:356
Begin stack trace, frames found: 7
?0 0x55a8be2ddeb7 in gearmand(+0x21eb7) [0x55a8be2ddeb7]
?1 0x55a8be2cbafa in gearmand(+0xfafa) [0x55a8be2cbafa]
?2 0x7f90e652f0f0 in /usr/lib/libpthread.so.0(+0x140f0) [0x7f90e652f0f0]
?3 0x55a8be2e8ea7 in gearmand(+0x2cea7) [0x55a8be2e8ea7]
?4 0x55a8be2e1fe9 in gearmand(+0x25fe9) [0x55a8be2e1fe9]
?5 0x7f90e65243e9 in /usr/lib/libpthread.so.0(+0x93e9) [0x7f90e65243e9]
#6 0x7f90e5e0d293 in clone at /usr/lib/libc.so.6
Ah, you're right, it is fixed. I'll go pester packagers for updates instead ;)