ansible-collections/cisco.nxos

Unable to respond to interactive prompt command

colinet opened this issue · 19 comments

SUMMARY

While running below playbook, it fails :

- name: switch_fc | helpers | session_reset - perform reset
  cisco.nxos.nxos_command:
    commands:
      - command: clear zone lock vsan 1014
        prompt: 'Do you want to continue'
        answer: 'y'
```

I get the below error: 

```
TASK [ds-role-san_CRUD : switch_fc | helpers | session_reset - perform reset] **************************************************************************************************************
Friday 06 October 2023  14:29:08 +0200 (0:00:00.057)       0:00:03.903 ******** 
failed: [localhost] (item={'name': 'fabric_a', 'switch_fabric': 'xxxxxxxxxxxx', 'vsan_id': 1014}) => {"ansible_loop_var": "fab", "changed": false, "fab": {"name": "fabric_a", "switch_fabric": "xxxxxxxxxxxx", "vsan_id": 1014}, "module_stderr": "command timeout triggered, timeout value is 30 secs.\nSee the timeout setting options in the Network Debug and Troubleshooting Guide.", "module_stdout": "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error"}
```

##### ISSUE TYPE
I used same example as exposed in the documentation:
https://docs.ansible.com/ansible/latest/collections/cisco/nxos/nxos_command_module.html

##### COMPONENT NAME
nxos

##### ANSIBLE VERSION
```paste below
[xxxxxxxxxx@xxxxxxxxxxxxxxx~]$  ansible --version
ansible [core 2.15.0]
  config file = /etc/ansible/ansible.cfg
  configured module search path = ['/home/xxxxxxxx/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python3.9/site-packages/ansible
  ansible collection location = /home/xxxxxxxxxxxx/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/local/bin/ansible
  python version = 3.9.16 (main, May 29 2023, 00:00:00) [GCC 11.3.1 20221121 (Red Hat 11.3.1-4)] (/usr/bin/python)
  jinja version = 3.1.2
  libyaml = True
[xxxxxxxxxxxx@xxxxxxxxxxxxxxxxx~]$
```

##### COLLECTION VERSION

```paste below
[xxxxxxxx@xxxxxxxxxxxxxxx~]$  ansible-galaxy collection list |grep pure
purestorage.flasharray        1.21.0 
purestorage.flasharray        1.18.0 
```

##### CONFIGURATION
<!--- Paste verbatim output from "ansible-config dump --only-changed" between quotes -->
```paste below
[xxxxxx@xxxxxxxxxxxx~]$  ansible-config dump --only-changed
CACHE_PLUGIN(/etc/ansible/ansible.cfg) = memory
CALLBACKS_ENABLED(/etc/ansible/ansible.cfg) = ['ansible.posix.profile_tasks']
CONFIG_FILE() = /etc/ansible/ansible.cfg
DEFAULT_FORKS(/etc/ansible/ansible.cfg) = 5
DEFAULT_GATHERING(/etc/ansible/ansible.cfg) = implicit
DEFAULT_HOST_LIST(/etc/ansible/ansible.cfg) = ['/etc/ansible/hosts']
DEFAULT_MANAGED_STR(/etc/ansible/ansible.cfg) = # WARNING: This script is managed by Ansible with The Linux Framework. Any manual changes will be lost the next time Ansible runs.
DEFAULT_POLL_INTERVAL(/etc/ansible/ansible.cfg) = 15
DEFAULT_ROLES_PATH(/etc/ansible/ansible.cfg) = ['/home/xxxxxxxx/workspace/ansible/ds-roles']
DEFAULT_TRANSPORT(/etc/ansible/ansible.cfg) = smart
DEFAULT_VAULT_PASSWORD_FILE(/etc/ansible/ansible.cfg) = /home/svc_worker/.vps.txt
DISPLAY_SKIPPED_HOSTS(/etc/ansible/ansible.cfg) = True
HOST_KEY_CHECKING(/etc/ansible/ansible.cfg) = False
PERSISTENT_CONNECT_RETRY_TIMEOUT(/etc/ansible/ansible.cfg) = 30
PERSISTENT_CONNECT_TIMEOUT(/etc/ansible/ansible.cfg) = 60
RETRY_FILES_ENABLED(/etc/ansible/ansible.cfg) = False
[xxxxxxx@xxxxxxxxxxxxx~]$ 
```

##### OS / ENVIRONMENT
Redhat 9.0

##### EXPECTED RESULTS
This should clear zone lock.

@colinet Is the target device Cisco MDS?

Yes, it is for an MDS switch.

I've tried different syntaxes. But no way. I wonder whether the example exposed in the documentation https://docs.ansible.com/ansible/latest/collections/cisco/nxos/nxos_command_module.html is valid.

@colinet As mentioned in the Notes section of docs, this module only has limited support for Cisco MDS switches and hence, might not fully work right out of the box, as it would for Nexus.

@srbharadwaj Would you be able to look into this?

@NilashishC is the option 'prompt' a valid one? i don't see that is the documentation.. and i also see that commented out in the code

# { command: <str>, output: <str>, prompt: <str>, response: <str> }

@srbharadwaj The prompt option is valid. Since commands can be of at least two forms - (a) a list of strings (commands to send), (b) a list of dictionary (command + prompt + answer combination), it's element type is set to raw in argspec. The prompt handling logic is implemented in the cliconf plugin and in the network_cli connection plugin code.

https://github.com/ansible-collections/cisco.nxos/blob/main/plugins/cliconf/nxos.py#L240-L248
https://github.com/ansible-collections/ansible.netcommon/blob/main/plugins/connection/network_cli.py#L1059

@srbharadwaj The following task is tested to be working with Nexus 9300v (NX-OS 9.3.6):

    - name: Switch to maintenance mode
      cisco.nxos.nxos_command:
        commands:
          - configure terminal
          - command: system mode maintenance
            prompt: Do you want to continue
            answer: y

@colinet You can temporarily turn off cli confirmation prompts before you run the clear command as a workaround. Have you tried that?

- name: switch_fc | helpers | session_reset - perform reset
  cisco.nxos.nxos_command:
    commands:
      - terminal dont-ask
      - clear zone lock vsan 1014

The solution works on one fabric but surprisingly failed on second fabric with unexpected result:

The playbook is now:


- name: switch_fc | helpers | session_reset - perform reset
  cisco.nxos.nxos_command:
    commands:
      - terminal dont-ask
      - clear device-alias session
      - "clear zone lock vsan {{ fab.vsan_id }}"
  vars:
    ansible_connection: "{{ san_CRUD_switch_fabric_api }}"
    ansible_network_os: "{{ san_CRUD_switch_fabric_os }}"
    ansible_user: "{{ san_CRUD_switch_fabric_svc_user }}"
    ansible_password: "{{ san_CRUD_switch_fabric_svc_password }}"
    ansible_host: "{{ fab.switch_fabric }}"
    ansible_httpapi_port: "{{ san_CRUD_switch_fabric_port }}"
    ansible_httpapi_use_ssl: true
    ansible_httpapi_validate_certs: false
  loop: "{{ reset_data }}"
  loop_control:
    loop_var: fab

The outcome is:

TASK [ds-role-san_CRUD : switch_fc | helpers | session_reset - perform reset] *********************************************************************************************************************************************************************************
Monday 09 October 2023  16:58:10 +0200 (0:00:00.055)       0:00:02.221 ******** 
ok: [localhost] => (item={'name': 'fabric_a', 'switch_fabric': 'switch_001', 'vsan_id': 1014})
failed: [localhost] (item={'name': 'fabric_b', 'switch_fabric': 'switch_002', 'vsan_id': 2014}) => {"ansible_loop_var": "fab", "changed": false, "fab": {"name": "fabric_b", "switch_fabric": "switch_002", "vsan_id": 2014}, "module_stderr": "clear zone lock vsan 2014: CLI execution error: Command will clear lock from the entire fabric only if issued on initiating switch.\nElse lock will be cleared only locally.\nVSAN 2014 is not active\n", "module_stdout": "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error"}

So it tells me on second fabric that VSAN 2014 is not active which is wrong.
If I run the command directly on the switch of the second fabric, it is successfull.

@colinet Just for my understanding, which connection type are you using to run the original task with the prompt and answer?

check the accounting logs when the failure occurred

I'am getting the following accounting log just after having run the playbook on the 1st switch of the second fabric:

Tue Oct 10 12:43:10 2023:type=stop:id=10.80.144.120@pts/0:user=dcnmuser:cmd=shell terminated because the ssh session closed
Tue Oct 10 12:43:10 2023:type=start:id=10.80.144.120@pts/7:user=dcnmuser:cmd=
Tue Oct 10 12:43:10 2023:type=update:id=10.80.144.120@pts/7:user=dcnmuser:cmd=terminal session-timeout 60 (SUCCESS)
Tue Oct 10 12:43:10 2023:type=update:id=10.80.144.120@pts/7:user=dcnmuser:cmd=terminal length 0 (SUCCESS)
Tue Oct 10 12:43:11 2023:type=stop:id=10.80.144.120@pts/7:user=dcnmuser:cmd=shell terminated because the ssh session closed
Tue Oct 10 12:43:11 2023:type=start:id=10.80.144.120@pts/0:user=dcnmuser:cmd=
Tue Oct 10 12:43:11 2023:type=update:id=10.80.144.120@pts/0:user=dcnmuser:cmd=terminal session-timeout 60 (SUCCESS)
Tue Oct 10 12:43:12 2023:type=update:id=10.80.144.120@pts/0:user=dcnmuser:cmd=terminal length 0 (SUCCESS)

@colinet Just for my understanding, which connection type are you using to run the original task with the prompt and answer?

I'am using API connexion type.
For the above playbook, I have:
san_CRUD_switch_fabric_api: ansible.netcommon.httpapi

I run the playbook with -vvv.
The outcome is

TASK [ds-role-san_CRUD : switch_fc | helpers | session_reset - perform reset] *********************************************************************************************************************************************************************************************
task path: /home/xxxxxxx/workspace/ansible/ds-roles/ds-role-san_CRUD/tasks/switch_fc/helpers/session_reset.yml:20
Tuesday 10 October 2023  14:57:05 +0200 (0:00:00.048)       0:00:02.201 ******* 
redirecting (type: action) cisco.nxos.nxos_command to cisco.nxos.nxos
redirecting (type: action) cisco.nxos.nxos_command to cisco.nxos.nxos
ok: [localhost] => (item={'name': 'fabric_a', 'switch_fabric': 'mhxcissan000sas', 'vsan_id': 1014}) => {
    "ansible_loop_var": "fab",
    "changed": false,
    "fab": {
        "name": "fabric_a",
        "switch_fabric": "mhxcissan000sas",
        "vsan_id": 1014
    },
    "invocation": {
        "module_args": {
            "commands": [
                "terminal dont-ask",
                "clear zone lock vsan 1014"
            ],
            "interval": 1,
            "match": "all",
            "retries": 9,
            "wait_for": null
        }
    },
    "stdout": [
        {},
        "Command will clear lock from the entire fabric only if issued on initiating switch.\nElse lock will be cleared only locally.\nNo pending info found"
    ],
    "stdout_lines": [
        {},
        [
            "Command will clear lock from the entire fabric only if issued on initiating switch.",
            "Else lock will be cleared only locally.",
            "No pending info found"
        ]
    ]
}
redirecting (type: action) cisco.nxos.nxos_command to cisco.nxos.nxos
redirecting (type: action) cisco.nxos.nxos_command to cisco.nxos.nxos
failed: [localhost] (item={'name': 'fabric_b', 'switch_fabric': 'mhxcissan001sas', 'vsan_id': 2014}) => {
    "ansible_loop_var": "fab",
    "changed": false,
    "fab": {
        "name": "fabric_b",
        "switch_fabric": "mhxcissan001sas",
        "vsan_id": 2014
    },
    "module_stderr": "clear zone lock vsan 2014: CLI execution error: Command will clear lock from the entire fabric only if issued on initiating switch.\nElse lock will be cleared only locally.\nVSAN 2014 is not active\n",
    "module_stdout": "",
    "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error"
}

PLAY RECAP ****************************************************************************************************************************************************************************************************************************************************************
localhost                  : ok=22   changed=0    unreachable=0    failed=1    skipped=4    rescued=0    ignored=0  

The message 'VSAN 2014 is not active' on the second fabric is unrelated to current action, and unexpected.

When I run the command ""clear zone lock vsan 2014" manually on the switch mhxcissan001sas, I get:

mhxcissan001sas# clear zone lock vsan 2014
Command will clear lock from the entire fabric only if issued on initiating switch.
Else lock will be cleared only locally.
Do you want to continue? (y/n) [n] y
No pending info found
mhxcissan001sas#

'VSAN 2014 is not active' should not show up when executing the command via the API and Ansible.

@colinet Just for my understanding, which connection type are you using to run the original task with the prompt and answer?

I'am using API connexion type. For the above playbook, I have: san_CRUD_switch_fabric_api: ansible.netcommon.httpapi

I don't think prompts will ever work with NX-API due to the very nature of HTTP. Have you tried doing the same thing via the NX-API sandbox? Does it work there?

@colinet Just for my understanding, which connection type are you using to run the original task with the prompt and answer?

I'am using API connexion type. For the above playbook, I have: san_CRUD_switch_fabric_api: ansible.netcommon.httpapi

I don't think prompts will ever work with NX-API due to the very nature of HTTP. Have you tried doing the same thing via the NX-API sandbox? Does it work there?

I'am fine with '- terminal dont-ask' 1st command (and forget about prompt through NX-API).
This is running on Fabric A.
But the command fails on Fabric B with "VSAN 2014 is not active" despite this VSAN is active.

On fabric B where the error related to VSAN 2014 not being active, the state is :

mhxcissan001sas# show vsan 2014
vsan 2014 information
         name:VSAN2014  state:active
         interoperability mode:default
         loadbalancing:src-id/dst-id/oxid
         operational state:up

mhxcissan001sas#

@colinet does the accounting log on mhxcissan001sas show failure after running the playbook? (show accounting log | i clear )
also what is the mhxcissan001sas switch version and model?