vstinner/python-ptrace

strace.py terminates with KeyError when tracing openssh server

vstinner opened this issue · 10 comments

Originally reported by: Pavel Šimerda (Bitbucket: pavlix, GitHub: pavlix)


For a network application testing project, I need to ptrace both client and server processes but tracing the SSH server apparently fails. So I also tried with strace.py and it quits with a traceback.

Reproducer:

Run the two comments in two terminal emulators alpha and beta. The behavior seems to be the same whether you make sure the SSH authentication succeeds or no.

alpha# ./strace.py -f -o trace.python-ptrace -- which sshd -D

beta$ ssh localhost true

Expected result:

With -D option, SSH server should run until you kill it and accept any incoming connections.

Actual result:

The first connection results in crashing the server with a traceback.


Original comment by Victor Stinner (Bitbucket: haypo, GitHub: haypo):


I'm no more interested by maintaining the python-ptrace project. Can you try to investigate the issue and propose a patch?

ptrace gets an unknown exit status.

Original comment by Pavel Šimerda (Bitbucket: pavlix, GitHub: pavlix):


I see. Should I assign any new issues coming from our use case to myself?

Original comment by Victor Stinner (Bitbucket: haypo, GitHub: haypo):


I see. Should I assign any new issues coming from our use case to myself?

As you want.

Original comment by Pavel Šimerda (Bitbucket: pavlix, GitHub: pavlix):


It looks like there is an actual 0-255 exitcode from ptrace() that is treated as if it was a combined status value from wait(). This patch fixes it for me by using the value directly as an exitcode.

Original comment by Victor Stinner (Bitbucket: haypo, GitHub: haypo):


Hum, ProcessSignal.childExit() uses getSignalInfo() on Linux. siginfo class is defined ptrace.binding.linux_struct, for SIGCHLD, childExit() uses siginfo._sigchld especially siginfo._sigchld.status.

I checked: if the child exits with exit code 3, siginfo._sigchld.status is 3. But if the child is killed by SIGKILL, siginfo._sigchld is 9: it's also possitive.

You need to find the flag to check if the child exited with an exit code or was killed by a signal.

Original comment by Pavel Šimerda (Bitbucket: pavlix, GitHub: pavlix):


In my opinion those information are not part of si_status but rather si_code. When si_code == CLD_EXITED, si_status contains the exit code. Also ptrace() and SIGTRAP are mentioned in the manpage. The structure is apparently different than the status passed by wait().

http://man7.org/linux/man-pages/man2/sigaction.2.html

Original comment by Victor Stinner (Bitbucket: haypo, GitHub: haypo):


In my opinion those information are not part of si_status but rather si_code. When si_code == CLD_EXITED, si_status contains the exit code.

Ok, so you need to enhance your patch to pass this info to ChildExit.

Original comment by Victor Stinner (Bitbucket: haypo, GitHub: haypo):


I marked the issue #16 as a duplicate of this issue.

Original comment by Victor Stinner (Bitbucket: haypo, GitHub: haypo):


Modifying formatProcessStatus() is wrong: if you get an exit code and not a status, the caller must be modified to not call formatProcessStatus(). According to trace.ptrace, the caller is signal_reason.py at line 154 (init): message = formatProcessStatus(status, "Child process %s" % pid).

The project moved to GitHub. Please reopen an issue there, or even better a pull request :-) https://github.com/haypo/python-ptrace

Original comment by Victor Stinner (Bitbucket: haypo, GitHub: haypo):


I'm unable to reproduce the bug using "./strace.py -f -o trace.python-ptrace -- $(which sshd) -D".