bloomberg/pystack

๐Ÿ’€ Engine error: Function not implemented ๐Ÿ’€

guy4261 opened this issue ยท 13 comments

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

I am SSHing into a Spark executor, and I'm trying to look at what it's doing. I am getting the following unusual error:

$ pystack remote <PID>
๐Ÿ’€ Engine error: Function not implemented ๐Ÿ’€

$ pystack -v remote 197 --exhaustive
INFO(process_remote): Analyzing process with pid 197 using stack method StackMethod.ALL with native mode NativeReportingMode.OFF
INFO(parse_maps_file_for_binary): python binary first map found: VirtualMap(start=0x0000000000400000, end=0x0000000000423000, filesize=0x23000, offset=0x0, device='09:7f', flags='r--p', inode=54676376, path='/usr/bin/python3.8')
INFO(parse_maps_file_for_binary): Process does not have a libpython.so, reading from binary
INFO(parse_maps_file_for_binary): Heap map found: VirtualMap(start=0x00000000011e4000, end=0x0000000001873000, filesize=0x68f000, offset=0x0, device='00:00', flags='rw-p', inode=0, path='[heap]')
INFO(_get_bss): Determined exact addr of .bss section: 0x93db20 (0x400000 + 0x53db20)
INFO(parse_maps_file_for_binary): bss map found: VirtualMap(start=0x000000000093db20, end=0x0000000000960ba8, filesize=0x23088, offset=0x53cb20, device='', flags='', inode=0, path='None')
INFO(parse_maps_file_for_binary): python binary first map found: VirtualMap(start=0x0000000000400000, end=0x0000000000423000, filesize=0x23000, offset=0x0, device='09:7f', flags='r--p', inode=54676376, path='/usr/bin/python3.8')
INFO(parse_maps_file_for_binary): Process does not have a libpython.so, reading from binary
INFO(parse_maps_file_for_binary): Heap map found: VirtualMap(start=0x00000000011e4000, end=0x0000000001873000, filesize=0x68f000, offset=0x0, device='00:00', flags='rw-p', inode=0, path='[heap]')
INFO(_get_bss): Determined exact addr of .bss section: 0x93db20 (0x400000 + 0x53db20)
INFO(parse_maps_file_for_binary): bss map found: VirtualMap(start=0x000000000093db20, end=0x0000000000960ba8, filesize=0x23088, offset=0x53cb20, device='', flags='', inode=0, path='None')
INFO(process_remote): Trying to stop thread 197
INFO(process_remote): Waiting for thread 197 to be stopped
INFO(process_remote): Process 197 attached
INFO(process_remote): Attempting to find symbol 'Py_Version' in /usr/bin/python3.8
๐Ÿ’€ Engine error: Function not implemented ๐Ÿ’€

Expected Behavior

Normal PyStack session

Steps To Reproduce

Inside a PySpark 3.3.2 executor (python3 -m pyspark.daemon),

$ pgrep python

# pick any of the PIDs and
$ pystack remote <PID>

No combination of CLI flags brings back any information beyond the ๐Ÿ’€ Engine error: Function not implemented ๐Ÿ’€ error.

Pystack Version

1.0.1

Python Version

3.8

Linux distribution

Ubuntu

Anything else?

No response

Hi @guy4261 and thanks for the report!

The error you are getting Function not implemented means that your system lacks some of the system calls that memray needs to work. Could you tell us what kernel version and Ubuntu version are you using? My guess is that your system is missing the process_vm_readv syscall.

Whoa, thanks for the quick response!!!

I am using Ubuntu 20.04 from public.ecr.aws/lts/ubuntu:20.04_stable.

I see [here](https://manpages.ubuntu.com/manpages/bionic/man2/process_vm_writev.2.html) that

These system calls were added in Linux 3.2.  Support is provided in  glibc  since  version 2.15.

While my version seems to be 2.31:

$ ldd --version
ldd (Ubuntu GLIBC 2.31-0ubuntu9.9) 2.31
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.

Hummmmm, that's interesting. We do test in ubuntu 20.04 and we haven't seen this. I will try to test with that docker container. What architecture are you using?

Can you also try to compile and run this program:

#include <stdio.h>
#include <stdlib.h>
#include <sys/uio.h>
#include <unistd.h>
#include <sys/syscall.h>

static ssize_t _process_vm_readv(
    pid_t pid,
    const struct iovec* lvec,
    unsigned long liovcnt,
    const struct iovec* rvec,
    unsigned long riovcnt,
    unsigned long flags)
{
    return syscall(SYS_process_vm_readv, pid, lvec, liovcnt, rvec, riovcnt, flags);
}

int main() {
    char buffer[1024];
    char buffer2[1024];

  for (int i = 0; i < 1024; i++) {
       buffer[i] = 'c';
    }

    struct iovec local_iov = { .iov_base = buffer2, .iov_len = sizeof(buffer2) };
    struct iovec remote_iov = { .iov_base = buffer, .iov_len = sizeof(buffer) };
    
    ssize_t bytes_read = _process_vm_readv(getpid(), &local_iov, 1, &remote_iov, 1, 0);
    
    if (bytes_read == -1) {
        perror("Error reading memory");
        return 1;
    }
    
    printf("Bytes read: %zd\n", bytes_read);
    
    printf("Memory contents:\n");
    for (int i = 0; i < bytes_read; i++) {
        printf("%02x ", buffer2[i] & 0xff);
    }
    printf("\n");
    
    return 0;
}

Also, please check the kernel version. Even if glibc has support the kernel may be too old in these machines.

I can properly use pystack on that container on my system so I think the problem is your kernel version. Check this out:

$ docker run --privileged -it public.ecr.aws/lts/ubuntu:20.04_stable
$ apt-get update && apt-get install python3 python3-pip libpython3.8-testsuite
...
$ python3 -m pip install pystack
Collecting pystack
  Downloading pystack-1.0.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.5 MB)
     |โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 4.5 MB 15.8 MB/s
Installing collected packages: pystack
Successfully installed pystack-1.0.1
$ python3 -m test
== CPython 3.8.10 (default, Mar 13 2023, 10:26:41) [GCC 9.4.0]
== Linux-6.3.4-arch1-1-x86_64-with-glibc2.29 little-endian
== cwd: /tmp/test_python_4424
== CPU count: 36
== encodings: locale=UTF-8, FS=utf-8
0:00:00 load avg: 0.83 Run tests sequentially
0:00:00 load avg: 0.83 [  1/423] test_grammar
0:00:00 load avg: 0.83 [  2/423] test_opcodes
0:00:00 load avg: 0.83 [  3/423] test_dict
0:00:00 load avg: 0.83 [  4/423] test_builtin
0:00:00 load avg: 0.83 [  5/423] test_exceptions
^Z
$ ps u
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  0.0  0.0   4240  2880 pts/0    Ss   14:52   0:00 /bin/bash
root        4424 23.5  0.0  34444 29084 pts/0    T    14:57   0:00 python3 -m test
root        4437  0.0  0.0  12280  8352 pts/0    T    14:57   0:00 /usr/bin/python3 -X faulthandler -I -c if 1:             import _testcapi             class C():
root        4438  0.0  0.0   5900  2880 pts/0    R+   14:57   0:00 ps u
$ pystack remote 4424
Traceback for thread 4424 (python3) [] (most recent call last):
    (Python) File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
        return _run_code(code, main_globals, None,
    (Python) File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
        exec(code, run_globals)
    (Python) File "/usr/lib/python3.8/test/__main__.py", line 2, in <module>
        main()
    (Python) File "/usr/lib/python3.8/test/libregrtest/main.py", line 715, in main
        Regrtest().main(tests=tests, **kwargs)
    (Python) File "/usr/lib/python3.8/test/libregrtest/main.py", line 637, in main
        self._main(tests, kwargs)
    (Python) File "/usr/lib/python3.8/test/libregrtest/main.py", line 690, in _main
        self.run_tests()
    (Python) File "/usr/lib/python3.8/test/libregrtest/main.py", line 518, in run_tests
        self.run_tests_sequential()
    (Python) File "/usr/lib/python3.8/test/libregrtest/main.py", line 420, in run_tests_sequential
        result = runtest(self.ns, test_name)
    (Python) File "/usr/lib/python3.8/test/libregrtest/runtest.py", line 193, in runtest
        return _runtest(ns, test_name)
    (Python) File "/usr/lib/python3.8/test/libregrtest/runtest.py", line 153, in _runtest
        result = _runtest_inner(ns, test_name,
    (Python) File "/usr/lib/python3.8/test/libregrtest/runtest.py", line 270, in _runtest_inner
        refleak = _runtest_inner2(ns, test_name)
    (Python) File "/usr/lib/python3.8/test/libregrtest/runtest.py", line 234, in _runtest_inner2
        test_runner()
      support.run_unittest(tests)
    (Python) File "/usr/lib/python3.8/test/support/__init__.py", line 2152, in run_unittest
        _run_suite(suite)
    (Python) File "/usr/lib/python3.8/test/support/__init__.py", line 2030, in _run_suite
        result = runner.run(suite)
    (Python) File "/usr/lib/python3.8/test/support/testresult.py", line 162, in run
        test(self.result)
    (Python) File "/usr/lib/python3.8/unittest/suite.py", line 84, in __call__
        return self.run(*args, **kwds)
    (Python) File "/usr/lib/python3.8/unittest/suite.py", line 122, in run
        test(result)
    (Python) File "/usr/lib/python3.8/unittest/suite.py", line 84, in __call__
        return self.run(*args, **kwds)
    (Python) File "/usr/lib/python3.8/unittest/suite.py", line 122, in run
        test(result)
    (Python) File "/usr/lib/python3.8/unittest/suite.py", line 84, in __call__
        return self.run(*args, **kwds)
    (Python) File "/usr/lib/python3.8/unittest/suite.py", line 122, in run
        test(result)
    (Python) File "/usr/lib/python3.8/unittest/case.py", line 736, in __call__
        return self.run(*args, **kwds)
    (Python) File "/usr/lib/python3.8/unittest/case.py", line 676, in run
        self._callTestMethod(testMethod)
    (Python) File "/usr/lib/python3.8/unittest/case.py", line 633, in _callTestMethod
        method()
    (Python) File "/usr/lib/python3.8/test/test_exceptions.py", line 1241, in test_memory_error_in_PyErr_PrintEx
        rc, out, err = script_helper.assert_python_failure("-c", code % i)
    (Python) File "/usr/lib/python3.8/test/support/script_helper.py", line 167, in assert_python_failure
        return _assert_python(False, *args, **env_vars)
    (Python) File "/usr/lib/python3.8/test/support/script_helper.py", line 141, in _assert_python
        res, cmd_line = run_python_until_end(*args, **env_vars)
    (Python) File "/usr/lib/python3.8/test/support/script_helper.py", line 132, in run_python_until_end
        out, err = proc.communicate()
    (Python) File "/usr/lib/python3.8/subprocess.py", line 1028, in communicate
        stdout, stderr = self._communicate(input, endtime, timeout)
    (Python) File "/usr/lib/python3.8/subprocess.py", line 1868, in _communicate
        ready = selector.select(timeout)
    (Python) File "/usr/lib/python3.8/selectors.py", line 415, in select
        fd_event_list = self._selector.poll(timeout)


root@687c85cd5159:/# uname -a
Linux 687c85cd5159 6.3.4-arch1-1 #1 SMP PREEMPT_DYNAMIC Wed, 24 May 2023 17:44:00 +0000 x86_64 x86_64 x86_64 GNU/Linux

root@687c85cd5159:/# ldd --version
ldd (Ubuntu GLIBC 2.31-0ubuntu9.9) 2.31
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.

Thanks for everything! Indeed perhaps my kernel is too old - you have 6.3.4 , I seem to have 5.10.76:

$ uname -a
Linux 2bb1717fb772 5.10.76-linuxkit #1 SMP Mon Nov 8 10:21:19 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Two other things I wonder about:

1. By default, I am not a root/sudoer in this docker environment. So I pulled it and ran it locally as -u root. Do I need to be a sudoer to use pystack?

1.1. Likewise - do I need my docker to run --privileged?

2. I did not even have libc6-dev installed when I started. Might that pose a problem?

$ gcc
bash: gcc: command not found

$ apt-get install --yes gcc
$ gcc foo.c
foo.c:1:10: fatal error: stdio.h: No such file or directory
$ apt-get install -y libc6-dev

$ gcc foo.c  # your code
$ ./a.out
Error reading memory: Operation not permitted
  1. By default, I am not a root/sudoer in this docker environment. So I pulled it and ran it locally as -u root. Do I need to be a sudoer to use pystack?

You need at least PTRACE perms in the container so you need to pass --cap-add=SYS_PTRACE or similar.

1.1. Likewise - do I need my docker to run --privileged?

No, just ptrace scope

  1. I did not even have libc6-dev installed when I started. Might that pose a problem?

To compile my example you need to install build-essential (less heavyweight alternatives exist but that's the easiest).

Indeed perhaps my kernel is too old - you have 6.3.4 , I seem to have 5.10.76:

Doesn't look like is old enough because 5.10.76 should have the syscall for sure.

Thank you! Running with --privileged seems to have done the trick! Both libc6-dev and build-essential indeed do the trick (=one can install whichever is better). I also tested like you (python3 -m test and ctrl+z, then pystack remote the process) and it worked like a charm.

Thank you very much!

On a Pythonic note - it might be worth to add this precious information to the error message. Seems like you knew just what to do :)))

Thanks!!!

Thanks for your message and confirming that everything works now as expected. ๐Ÿค˜

I'm closing this issue but feel free to reopen if we missed anything :)

๐Ÿ’ฏ