delphix/sdb

stacks command crashes occasionally with KeyError while accessing TASK_STATES

sdimitro opened this issue · 0 comments

sdb: could not get debugging information for:
/lib/modules/5.4.0-42-generic/kernel/net/connstat/connstat.ko (libdwfl error: No DWARF information found)
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/sdb/internal/repl.py", line 107, in eval_cmd
for obj in invoke(self.target, [], input_):
File "/usr/lib/python3/dist-packages/sdb/pipeline.py", line 146, in invoke
yield from execute_pipeline(first_input, pipeline)
File "/usr/lib/python3/dist-packages/sdb/pipeline.py", line 83, in execute_pipeline
yield from massage_input_and_call(pipeline[-1], this_input)
File "/usr/lib/python3/dist-packages/sdb/pipeline.py", line 66, in massage_input_and_call
yield from cmd.call(objs)
File "/usr/lib/python3/dist-packages/sdb/command.py", line 329, in call
result, not issubclass(self.__class__, SingleInputCommand))
File "/usr/lib/python3/dist-packages/sdb/command.py", line 290, in __invalid_memory_objects_check
for obj in objs:
File "/usr/lib/python3/dist-packages/sdb/command.py", line 775, in _call
self.pretty_print(self.caller(objs))
File "/usr/lib/python3/dist-packages/sdb/commands/stacks.py", line 400, in pretty_print
self.print_stacks(filter(self.match_stack, objs))
File "/usr/lib/python3/dist-packages/sdb/commands/stacks.py", line 374, in print_stacks
for stack_key, tasks in Stacks.aggregate_stacks(objs):
File "/usr/lib/python3/dist-packages/sdb/commands/stacks.py", line 367, in aggregate_stacks
stack_key = (Stacks.task_struct_get_state(task),
File "/usr/lib/python3/dist-packages/sdb/commands/stacks.py", line 215, in task_struct_get_state
return Stacks.TASK_STATES[(state | exit_state) & 0x7f]
KeyError: 17

From Slack:

serapheim  07:48
So this always happens at the same spot which implies that there is something wrong with getting the task’s state in this line:
>> File "/usr/lib/python3/dist-packages/sdb/commands/stacks.py", line 215, in task_struct_get_state
>> return Stacks.TASK_STATES[(state | exit_state) & 0x7f]
This could be due to 2 things:
[1] The kernel has some new state or has changed the behavior of the existing ones an drgn or sdb have not been updated
[2] The task is cleaned up as we are reading it and gives us a bogus value
I’ll file a bug and investigate further when I get the time