gearman/gearmand

Gearman server crashes if a WORK_EXCEPTION is delivered after a WORK_COMPLETE

passcod opened this issue · 8 comments

Gearmand 1.1.18

I can reproduce and provide logs but am away at the moment, filing from memory.

If you grab a job, then issue a WORK_COMPLETE, and then issue a WORK_EXCEPTION and WORK_FAIL, the entire server crashes.

Really, this is then a worker bug, as that sequence of events shouldn't happen, but it seems unfortunate that one misbehaving worker can crash the server.

I agree that that shouldn't crash the server. A worker which does that would be helpful in fixing the problem.

Any particular language/binding is easiest for you? I discovered this using abraxas (node js) but I can do whatever.

Well, it would probably be best if it could ultimately end up in the test suite, so C or C++? Maybe there's a test in the test suite that you could adapt? If so, that would be amazing!

Okay, I'll have a go

Okay, correction, it's a WORK_EXCEPTION after a WORK_COMPLETE. Repro incoming

Okay, I tried to write C but gave up. Instead, I wrote bash. Hopefully it's relatively portable bash (so long as you have hexdump and bash >= 5):

gearcrash.sh
#!/usr/bin/env bash

set -euo pipefail

exec 3<>/dev/tcp/127.0.0.1/4730

# Set client ID -- essentially to test if the binary connection is working
echo -ne '\0REQ\0\0\0\x16\0\0\0\x05Hello' >&3

# Worker: CANDO crashtest
echo -ne '\0REQ\0\0\0\x01\0\0\0\x09crashtest' >&3

# Worker: PRE_SLEEP
echo -ne '\0REQ\0\0\0\x04\0\0\0\0' >&3

sleep 1
echo Submitting job

# Client: SUBMIT_JOB crashtest unique workload
echo -ne '\0REQ\0\0\0\x07\0\0\0\x19crashtest\0unique\0workload' >&3

handle=$(timeout 0.2 cat <&3 || true)
if [[ -z "$handle" ]]; then
  echo "gearman didn't give us a handle"
  exit 1
fi

# Clean off the binary gunk
handle=$(grep -aoP '(H:[\w-]+:\d+)' <<< "$handle")

sleep 1
echo Grabbing job

# Worker: GRAB_JOB
echo -ne '\0REQ\0\0\0\x09\0\0\0\0' >&3
sleep 1

# We're going to ignore anything the server sends
# and just assume we got the job we created above

set +e
data=$(timeout 1 cat <&3 || true)
set -e
echo "Data from server:"
hexdump -C <<< "$data"

sleep 3
echo "Crash sequence now"


# handle + 2 bytes (nul and single-byte data)
donelen=$(echo -ne "$handle\0." | hexdump | tail -n1 | cut -c6-)

# Worker: WORK_COMPLETE
echo -ne "\0REQ\0\0\0\x0D\0\0\0\x${donelen}${handle}\0." >&3

# Worker: WORK_EXCEPTION
echo -ne "\0REQ\0\0\0\x19\0\0\0\x${donelen}${handle}\0." >&3

set +e
data=$(timeout 1 cat <&3 || true)
set -e
echo "Data from server:"
hexdump -C <<< "$data"

echo "Gearman should have crashed"
Output from running it
Submitting job
./gearcrash.sh: line 22: warning: command substitution: ignored null byte in input
Grabbing job
./gearcrash.sh: line 42: warning: command substitution: ignored null byte in input
Data from server:
00000000  52 45 53 0b 20 48 3a 6b  61 79 64 65 6c 2d 6b 6f  |RES. H:kaydel-ko|
00000010  3a 31 63 72 61 73 68 74  65 73 74 77 6f 72 6b 6c  |:1crashtestworkl|
00000020  6f 61 64 0a                                       |oad.|
00000024
Crash sequence now
./gearcrash.sh: line 62: warning: command substitution: ignored null byte in input
Data from server:
00000000  52 45 53 0d 0f 48 3a 6b  61 79 64 65 6c 2d 6b 6f  |RES..H:kaydel-ko|
00000010  3a 31 2e 0a                                       |:1..|
00000014
Gearman should have crashed
Output from gearmand
~$> gearmand --verbose DEBUG
--syslog
--log-file
stderr
  DEBUG 2020-12-09 21:13:59.392010 [  main ] THREADS: 4 -> libgearman-server/gearmand.cc:263
   INFO 2020-12-09 21:13:59.392099 [  main ] Initializing Gear on port 4730 with SSL: false
   INFO 2020-12-09 21:13:59.392180 [  main ] Starting up with pid 2136195, verbose is set to DEBUG
  DEBUG 2020-12-09 21:13:59.392237 [  main ] Method for libevent: epoll -> libgearman-server/gearmand.cc:364
  DEBUG 2020-12-09 21:13:59.392230 [  main ] staring up Epoch thread -> libgearman-server/timer.cc:61
  DEBUG 2020-12-09 21:13:59.392394 [  main ] Trying to listen on 0.0.0.0:4730 -> libgearman-server/gearmand.cc:646
   INFO 2020-12-09 21:13:59.392430 [  main ] Listening on 0.0.0.0:4730 (9)
  DEBUG 2020-12-09 21:13:59.392443 [  main ] Trying to listen on :::4730 -> libgearman-server/gearmand.cc:646
   INFO 2020-12-09 21:13:59.392471 [  main ] Listening on :::4730 (10)
  DEBUG 2020-12-09 21:13:59.392490 [  main ] Creating wakeup pipe -> libgearman-server/gearmand.cc:915
  DEBUG 2020-12-09 21:13:59.392516 [  main ] Creating 4 threads -> libgearman-server/gearmand.cc:378
  DEBUG 2020-12-09 21:13:59.392534 [  main ] Initializing libevent for IO thread -> libgearman-server/gearmand_thread.cc:224
  DEBUG 2020-12-09 21:13:59.392572 [  main ] Creating IO thread wakeup pipe -> libgearman-server/gearmand_thread.cc:495
  DEBUG 2020-12-09 21:13:59.392646 [  main ] Thread 1 created -> libgearman-server/gearmand_thread.cc:273
  DEBUG 2020-12-09 21:13:59.392694 [     1 ] Entering thread event loop -> libgearman-server/gearmand_thread.cc:463
  DEBUG 2020-12-09 21:13:59.392730 [  main ] Initializing libevent for IO thread -> libgearman-server/gearmand_thread.cc:224
  DEBUG 2020-12-09 21:13:59.392779 [  main ] Creating IO thread wakeup pipe -> libgearman-server/gearmand_thread.cc:495
  DEBUG 2020-12-09 21:13:59.392846 [  main ] Thread 2 created -> libgearman-server/gearmand_thread.cc:273
  DEBUG 2020-12-09 21:13:59.392869 [  main ] Initializing libevent for IO thread -> libgearman-server/gearmand_thread.cc:224
  DEBUG 2020-12-09 21:13:59.392895 [     2 ] Entering thread event loop -> libgearman-server/gearmand_thread.cc:463
  DEBUG 2020-12-09 21:13:59.392947 [  main ] Creating IO thread wakeup pipe -> libgearman-server/gearmand_thread.cc:495
  DEBUG 2020-12-09 21:13:59.393024 [  main ] Thread 3 created -> libgearman-server/gearmand_thread.cc:273
  DEBUG 2020-12-09 21:13:59.393050 [  main ] Initializing libevent for IO thread -> libgearman-server/gearmand_thread.cc:224
  DEBUG 2020-12-09 21:13:59.393068 [     3 ] Entering thread event loop -> libgearman-server/gearmand_thread.cc:463
  DEBUG 2020-12-09 21:13:59.393097 [  main ] Creating IO thread wakeup pipe -> libgearman-server/gearmand_thread.cc:495
  DEBUG 2020-12-09 21:13:59.393162 [  main ] Thread 4 created -> libgearman-server/gearmand_thread.cc:273
  DEBUG 2020-12-09 21:13:59.393177 [  main ] replaying queue: begin -> libgearman-server/gearmand.cc:391
  DEBUG 2020-12-09 21:13:59.393188 [  main ] __replay -> libgearman-server/plugins/queue/default/queue.cc:101
  DEBUG 2020-12-09 21:13:59.393191 [     4 ] Entering thread event loop -> libgearman-server/gearmand_thread.cc:463
  DEBUG 2020-12-09 21:13:59.393199 [  main ] replaying queue: end -> libgearman-server/gearmand.cc:397
   INFO 2020-12-09 21:13:59.393214 [  main ] Adding event for listening socket (9)
   INFO 2020-12-09 21:13:59.393232 [  main ] Adding event for listening socket (10)
  DEBUG 2020-12-09 21:13:59.393250 [  main ] Adding event for wakeup pipe -> libgearman-server/gearmand.cc:966
  DEBUG 2020-12-09 21:13:59.393267 [  main ] Entering main event loop -> libgearman-server/gearmand.cc:406
  DEBUG 2020-12-09 21:14:01.548646 [  main ] accept() fd:33 -> libgearman-server/gearmand.cc:874
   INFO 2020-12-09 21:14:01.548689 [  main ] Accepted connection from 127.0.0.1:36710
  DEBUG 2020-12-09 21:14:01.548735 [     4 ] Received CON wakeup event -> libgearman-server/gearmand_thread.cc:619
  DEBUG 2020-12-09 21:14:01.548798 [     4 ]       127.0.0.1:36710 Watching  POLLIN  -> libgearman-server/gearmand_thread.cc:151
  DEBUG 2020-12-09 21:14:01.548832 [     4 ] Gear connection made: 127.0.0.1:36710 -> libgearman-server/plugins/protocol/gear/protocol.cc:407
  DEBUG 2020-12-09 21:14:01.548859 [     4 ] 127.0.0.1:36710 Ready     POLLIN  -> libgearman-server/gearmand_con.cc:138
  DEBUG 2020-12-09 21:14:01.548889 [     4 ] read 50 bytes -> libgearman-server/io.cc:798
  DEBUG 2020-12-09 21:14:01.548921 [     4 ] Gear unpack -> libgearman-server/plugins/protocol/gear/protocol.cc:117
  DEBUG 2020-12-09 21:14:01.548939 [     4 ] Received GEARMAN_SET_CLIENT_ID -> libgearman-server/thread.cc:311
  DEBUG 2020-12-09 21:14:01.548966 [     4 ] Gear unpack -> libgearman-server/plugins/protocol/gear/protocol.cc:117
  DEBUG 2020-12-09 21:14:01.548987 [     4 ] Received GEARMAN_CAN_DO -> libgearman-server/thread.cc:311
  DEBUG 2020-12-09 21:14:01.548985 [  proc ] PACKET COMMAND: GEARMAN_SET_CLIENT_ID -> libgearman-server/server.cc:119
  DEBUG 2020-12-09 21:14:01.549005 [     4 ] Gear unpack -> libgearman-server/plugins/protocol/gear/protocol.cc:117
  DEBUG 2020-12-09 21:14:01.549042 [     4 ] Received GEARMAN_PRE_SLEEP -> libgearman-server/thread.cc:311
  DEBUG 2020-12-09 21:14:01.549055 [  proc ] identifier set to Hello -> libgearman-server/connection.cc:335
  DEBUG 2020-12-09 21:14:01.549064 [     4 ]       127.0.0.1:36710 Watching  POLLIN  -> libgearman-server/gearmand_thread.cc:151
  DEBUG 2020-12-09 21:14:01.549075 [  proc ] PACKET COMMAND: GEARMAN_CAN_DO -> libgearman-server/server.cc:119
  DEBUG 2020-12-09 21:14:01.549093 [  proc ] Registering function: crashtest -> libgearman-server/server.cc:522
  DEBUG 2020-12-09 21:14:01.549110 [  proc ] PACKET COMMAND: GEARMAN_PRE_SLEEP -> libgearman-server/server.cc:119
  DEBUG 2020-12-09 21:14:02.550690 [     4 ] 127.0.0.1:36710 Ready     POLLIN  -> libgearman-server/gearmand_con.cc:138
  DEBUG 2020-12-09 21:14:02.550739 [     4 ] read 37 bytes -> libgearman-server/io.cc:798
  DEBUG 2020-12-09 21:14:02.550760 [     4 ] Gear unpack -> libgearman-server/plugins/protocol/gear/protocol.cc:117
  DEBUG 2020-12-09 21:14:02.550774 [     4 ] Received GEARMAN_SUBMIT_JOB -> libgearman-server/thread.cc:311
  DEBUG 2020-12-09 21:14:02.550795 [     4 ]       127.0.0.1:36710 Watching  POLLIN  -> libgearman-server/gearmand_thread.cc:151
  DEBUG 2020-12-09 21:14:02.550801 [  proc ] PACKET COMMAND: GEARMAN_SUBMIT_JOB -> libgearman-server/server.cc:119
  DEBUG 2020-12-09 21:14:02.550843 [  proc ] Received submission, function:crashtest unique:unique with 2 arguments -> libgearman-server/server.cc:248
  DEBUG 2020-12-09 21:14:02.550954 [  proc ] Comparing queue 0 to limit 0 for priority 1 -> libgearman-server/job.cc:175
  DEBUG 2020-12-09 21:14:02.550969 [  proc ] JOB H:kaydel-ko:1 :3906686023 -> libgearman-server/job.cc:240
 NOTICE 2020-12-09 21:14:02.551010 [  proc ] accepted,crashtest,unique,0 -> libgearman-server/server.cc:317
  DEBUG 2020-12-09 21:14:02.551018 [     4 ] Received RUN wakeup event -> libgearman-server/gearmand_thread.cc:624
  DEBUG 2020-12-09 21:14:02.551044 [     4 ] Sent NOOP -> libgearman-server/thread.cc:356
  DEBUG 2020-12-09 21:14:02.551087 [     4 ] send() 37 bytes to peer -> libgearman-server/io.cc:403
  DEBUG 2020-12-09 21:14:02.551102 [     4 ] Sent JOB_CREATED -> libgearman-server/thread.cc:356
  DEBUG 2020-12-09 21:14:03.761100 [     4 ] 127.0.0.1:36710 Ready     POLLIN  -> libgearman-server/gearmand_con.cc:138
  DEBUG 2020-12-09 21:14:03.761157 [     4 ] read 12 bytes -> libgearman-server/io.cc:798
  DEBUG 2020-12-09 21:14:03.761179 [     4 ] Gear unpack -> libgearman-server/plugins/protocol/gear/protocol.cc:117
  DEBUG 2020-12-09 21:14:03.761198 [     4 ] Received GEARMAN_GRAB_JOB -> libgearman-server/thread.cc:311
  DEBUG 2020-12-09 21:14:03.761223 [     4 ]       127.0.0.1:36710 Watching  POLLIN  -> libgearman-server/gearmand_thread.cc:151
  DEBUG 2020-12-09 21:14:03.761264 [  proc ] PACKET COMMAND: GEARMAN_GRAB_JOB -> libgearman-server/server.cc:119
  DEBUG 2020-12-09 21:14:03.761295 [  proc ] Jobs available for crashtest: 1 -> libgearman-server/gearmand_con.cc:366
  DEBUG 2020-12-09 21:14:03.761308 [  proc ] Sending GEARMAN_COMMAND_JOB_ASSIGN Function: crashtest(9) with data sized (8) -> libgearman-server/server.cc:663
  DEBUG 2020-12-09 21:14:03.761349 [     4 ] Received RUN wakeup event -> libgearman-server/gearmand_thread.cc:624
  DEBUG 2020-12-09 21:14:03.761413 [     4 ] send() 44 bytes to peer -> libgearman-server/io.cc:403
  DEBUG 2020-12-09 21:14:03.761433 [     4 ] Sent JOB_ASSIGN -> libgearman-server/thread.cc:356
  DEBUG 2020-12-09 21:14:08.777932 [     4 ] 127.0.0.1:36710 Ready     POLLIN  -> libgearman-server/gearmand_con.cc:138
  DEBUG 2020-12-09 21:14:08.777989 [     4 ] read 27 bytes -> libgearman-server/io.cc:798
  DEBUG 2020-12-09 21:14:08.778003 [     4 ] Gear unpack -> libgearman-server/plugins/protocol/gear/protocol.cc:117
  DEBUG 2020-12-09 21:14:08.778014 [     4 ] Received GEARMAN_WORK_COMPLETE -> libgearman-server/thread.cc:311
  DEBUG 2020-12-09 21:14:08.778049 [     4 ] read 27 bytes -> libgearman-server/io.cc:798
  DEBUG 2020-12-09 21:14:08.778064 [     4 ] Gear unpack -> libgearman-server/plugins/protocol/gear/protocol.cc:117
  DEBUG 2020-12-09 21:14:08.778079 [     4 ] Received GEARMAN_WORK_EXCEPTION -> libgearman-server/thread.cc:311
  DEBUG 2020-12-09 21:14:08.778099 [     4 ]       127.0.0.1:36710 Watching  POLLIN  -> libgearman-server/gearmand_thread.cc:151
  DEBUG 2020-12-09 21:14:08.778096 [  proc ] PACKET COMMAND: GEARMAN_WORK_COMPLETE -> libgearman-server/server.cc:119
  DEBUG 2020-12-09 21:14:08.778150 [  proc ] PACKET COMMAND: GEARMAN_WORK_EXCEPTION -> libgearman-server/server.cc:119
  DEBUG 2020-12-09 21:14:08.778157 [     4 ] Received RUN wakeup event -> libgearman-server/gearmand_thread.cc:624
  DEBUG 2020-12-09 21:14:08.778213 [     4 ] send() 27 bytes to peer -> libgearman-server/io.cc:403
  DEBUG 2020-12-09 21:14:08.778235 [     4 ] Sent WORK_COMPLETE -> libgearman-server/thread.cc:356

Begin stack trace, frames found: 7
?0  0x55a8be2ddeb7 in gearmand(+0x21eb7) [0x55a8be2ddeb7]
?1  0x55a8be2cbafa in gearmand(+0xfafa) [0x55a8be2cbafa]
?2  0x7f90e652f0f0 in /usr/lib/libpthread.so.0(+0x140f0) [0x7f90e652f0f0]
?3  0x55a8be2e8ea7 in gearmand(+0x2cea7) [0x55a8be2e8ea7]
?4  0x55a8be2e1fe9 in gearmand(+0x25fe9) [0x55a8be2e1fe9]
?5  0x7f90e65243e9 in /usr/lib/libpthread.so.0(+0x93e9) [0x7f90e65243e9]
#6  0x7f90e5e0d293 in clone at /usr/lib/libc.so.6

@passcod: I just noticed that you mentioned you were using gearmand 1.1.18. Have you tried to reproduce this with 1.1.19.1? If not, could you try? Because I think this issue might have been fixed already in PR #183.

Ah, you're right, it is fixed. I'll go pester packagers for updates instead ;)