ansible/pylibssh

pylibssh unable to handle a long stream of output from command execution

NilashishC opened this issue · 0 comments

SUMMARY
  • Related to ansible-collections/cisco.nxos#736

  • We are currently encountering a situation with pylibssh, where the execution consistently halts every time a certain % of data has been received.

  • The following command is sent to a Cisco Nexus switch (using pylibssh transport) to initiate copying a file (~2GB) from a remote server to itself.
    copy scp://test@192.168.1.100/ansible/nxos64.10.1.1.bin bootflash:/nxos64.10.1.1.bin vrf management use-kstack

  • In every iteration, it suddenly stops receiving the data once the transfer is apparently ~28% complete. This causes Ansible to wait until persistent_command_timeout is reached and then fail the task. By nature of how network_cli connection works, the subsequent tasks also fail (unless the connection is explicitly reset), since it is not able to identify a command prompt from the last received response window, which of course is stuck at the "28%" output. Note that the actual file pull continues to happen on the device and ends successfully when completed.

  • We have tried bumping command_timeout to a much bigger value than is required for the file pull to complete. The result is still the same.

A small snippet of the output that this command generates and it sent over the wire:

nxos64.10.1.1.bin                           ...               20%  309MB   1.1MB/s   18:31 ETA '
  
nxos64.10.1.1.bin                           ...            22%  340MB   2.7MB/s   07:11 ETA \r' 

nxos64.10.1.1.bin                           ...          25%  379MB   3.8MB/s   04:57 ETA \r'

nxos64.10.1.1.bin                           ...        28%  438MB   5.2MB/s   03:21 ETA \r' 
  • This does not happen if we switch to paramiko.
ISSUE TYPE
  • Bug Report
PYLISSH and LIBSSH VERSION
Name: ansible-pylibssh
Version: 1.1.0
Summary: Python bindings for libssh client specific to Ansible use case
Home-page: https://github.com/ansible/pylibssh
Author: Ansible, Inc.
Author-email: info+github/ansible/pylibssh@ansible.com
License: LGPLv2+
Location: /home/nchakrab/.virtualenvs/core/lib/python3.10/site-packages
Requires: 
Required-by: 
bash-4.4# rpm -qa| grep libssh
libssh-0.9.6-3.el8.x86_64
python39-ansible-pylibssh-1.0.0-1.el8ap.x86_64
libssh-config-0.9.6-3.el8.noarch
OS / ENVIRONMENT
bash-4.4# cat /etc/os-release 
NAME="Red Hat Enterprise Linux"
VERSION="8.6 (Ootpa)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="8.6"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Red Hat Enterprise Linux 8.6 (Ootpa)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:8::baseos"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://access.redhat.com/documentation/red_hat_enterprise_linux/8/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_BUGZILLA_PRODUCT_VERSION=8.6
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.6"
STEPS TO REPRODUCE
---
- hosts: nxos
  gather_facts: no
  tasks:
    - name: initiate file copy from device (will take 10 minutes)
      cisco.nxos.nxos_file_copy:
        file_pull: true
        local_file: nxos64-cs.10.2.5.M.bin
        local_file_directory: /
        remote_file: /tmp/nxos64-cs.10.2.5.M.bin
        remote_scp_server: 192.168.1.10
        remote_scp_server_user: admin
        remote_scp_server_password: admin
        vrf: management
      ignore_errors: true
   - name: "SCP Copying file {{ file_to_copy }} to device"
      ansible.netcommon.cli_command:
        check_all: true
        command: "copy scp://{{ https_scp_servers[copy_server]['user'] }}@{{ https_scp_servers[copy_server]['ip'] }}{{ https_scp_servers[copy_server]['path'] }}/{{ file_to_copy }} bootflash:/{{ file_to_copy }} vrf {{ copy_vrf }}"
        prompt: "password"
        answer: "{{ https_scp_servers[copy_server]['pass'] | string }}"
#      register: scp_output
EXPECTED RESULTS
  • Task ends successfully once the file pull operation is complete.
ACTUAL RESULTS
  • Execution halts at a certain stage, causing Ansible to wait until command timeout is reached, then fails the task and the subsequent onces.