kovidgoyal/kitty

test_ssh_copy failure on riscv64 and ppc64el

nileshpatra opened this issue · 4 comments

Describe the bug
test_ssh_copy fails on ppc and riscv archs

To Reproduce
Simply run kitty test suite on those archs.

Screenshots
Log:

======================================================================
ERROR: test_ssh_copy (kitty_tests.ssh.SSHKitten.test_ssh_copy) (sh='python3')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/<<PKGBUILDDIR>>/kitty/launcher/../../kitty_tests/ssh.py", line 95, in test_ssh_copy
    self.check_bootstrap(
    conf = 'copy simple-file\ncopy s1\ncopy --symlink-strategy=keep-path s2\ncopy --dest=a/sfa simple-file\ncopy --glob g.*\ncopy --exclude **/w.* --exclude **/r d1\n'
    contents = {'.local/share/kitty-ssh-kitten/kitty/version', 'g.2', 's2', 'a/sfa', 'g.1', '.terminfo/kitty.terminfo', 's1', '.local/share/kitty-ssh-kitten/kitty/bin/kitty', 'd1/d2/x', 'd1/y', '.local/share/kitty-ssh-kitten/kitty/bin/kitten', 'simple-file'}
    f = <_io.TextIOWrapper name='/tmp/tmp8afyvvrl/s2' mode='r' encoding='utf-8'>
    local_home = '/tmp/tmp0lkc_ah_'
    remote_home = '/tmp/tmp5zr3e6vj'
    self = <kitty_tests.ssh.SSHKitten testMethod=test_ssh_copy>
    sh = 'python3'
    simple_data = 'rkjlhfwf9whoaa'
    tname = '.terminfo'
    touch = <function SSHKitten.test_ssh_copy.<locals>.touch at 0x3f822f6480>
    w = 's2'
  File "/<<PKGBUILDDIR>>/kitty/launcher/../../kitty_tests/ssh.py", line 261, in check_bootstrap
    pty.wait_till(check_untar_or_fail, timeout=60)
    SHELL_INTEGRATION_VALUE = ''
    check_untar_or_fail = <function SSHKitten.check_bootstrap.<locals>.check_untar_or_fail at 0x3f805dbb00>
    conf = 'copy simple-file\ncopy s1\ncopy --symlink-strategy=keep-path s2\ncopy --dest=a/sfa simple-file\ncopy --glob g.*\ncopy --exclude **/w.* --exclude **/r d1\n\nshell_integration disabled\ninterpreter python3'
    env = {'PATH': '/<<PKGBUILDDIR>>/kitty_tests/kitty/launcher:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games', 'HOME': '/tmp/tmp5zr3e6vj', 'TERM': 'xterm-kitty', 'TERMINFO': '/<<PKGBUILDDIR>>/terminfo', 'KITTY_SHELL_INTEGRATION': 'enabled', 'KITTY_INSTALLATION_DIR': '/<<PKGBUILDDIR>>', 'BASH_SILENCE_DEPRECATION_WARNING': '1', 'PYTHONDONTWRITEBYTECODE': '1', 'WEZTERM_SHELL_SKIP_ALL': '1', 'USER': 'buildd'}
    home = '/tmp/tmp0lkc_ah_'
    home_dir = '/tmp/tmp5zr3e6vj'
    launcher = 'sh'
    login_shell = ''
    pre_data = ''
    pty = <kitty_tests.PTY object at 0x3f820c0290>
    self = <kitty_tests.ssh.SSHKitten testMethod=test_ssh_copy>
    sh = 'python3'
    test_script = 'print("UNTAR_DONE", flush=True); os.execlp("sh", "sh", "-c", \'env; exit 0\')'
  File "/<<PKGBUILDDIR>>/kitty/launcher/../../kitty_tests/__init__.py", line 368, in wait_till
    raise TimeoutError(f'The condition was not met. Screen contents: \n {repr(self.screen_contents())}')
    end_time = 614877.703239922
    q = <function SSHKitten.check_bootstrap.<locals>.check_untar_or_fail at 0x3f805dbb00>
    self = <kitty_tests.PTY object at 0x3f820c0290>
    timeout = 60
TimeoutError: The condition was not met. Screen contents: 
 ''

----------------------------------------------------------------------
Ran 144 tests in 155.519s

Environment details

Observed on Debian Gnu/Linux. Build log: https://buildd.debian.org/status/fetch.php?pkg=kitty&arch=riscv64&ver=0.34.1-1&stamp=1714553712&raw=0

Additional context
This seems to be a regression in 0.34.1. The same test was passing in 0.33.1 release. Old log here

I am afraid I dont know anything about those archs and dont really care
about them, but patches are welcome. The test failure indicates that the
python process is hanging running bootstrap.py.

I did the following:

  • Checked this on a riscv machine myself and it seems to sometimes fail, i.e. flaky test. ssh_copy is taking some time/stuck.
  • Asked a risc porter and they told me it may be due to the arch being slow and hence timeout issues.

On increasing the timeout of check_untar_or_fail to 180, I did not get the failure after running the tests around 10 times. It more or less does look like a timeout pitfall but I am not 100% certain.

On trying the same build (no code changes) in our build machines 5 times, I could get a successful build and hence the test seems flaky here (possibly due to timeout).

What do you think?

I am fine with increasing the timeout though untarring a small file
should never take more than a few seconds.

No wait that error comes up again when I triggered it 5 more times :(