ansible-community/ara

"list index out of range"

module0x90 opened this issue · 6 comments

What is the issue ?

Occasionally we're getting errors similar to this (typing it off a screenshot :-( )

PLAY [Record playbook run in ara] *****
Failed to patch on /api/va/playbooks/3959:  {'labels'> ['remote_user:root', 'check: False', 'tags:all', 'subset:.....']}
Failed to patch on /api/va/playbooks/3959:  {'labels'> ['remote_user:root', 'check: False', 'tags:all', 'subset:.....']}
[WARNING]: Failure using method (v2_playbook_on_play_start) in callback plugin
(<ansible.plugins.callback.ara_default.CallbackModule object at 0x7f2a85b19910>): Expecting value: line 2 column 1 (char 1)

TASK [Get the currently running playbook] *****
[WARNING]: Failure using method (v2_playbook_on_task_start) in callback plugin
(<ansible.plugins.callback.ara_default.CallbackModule object at 0x7f2a85b19910>): 'NoneType' object is not subscriptable
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: IndexError: list index out of range
fatal: [..... -> localhost]: FAILED! => {"msg": "Unexpected failure during module execution: list index out of range", "stdout": ""}
[WARNING]: Failure using method (v2_runner_on_failed) in callback plugin (<ansible.plugins.callback.ara_default.CallbackModule object at 0x7f2a85b19910>): '005056b1-5a9b-7534-e360-00000000102'

NO MORE HOSTS LEFT

It is not reproducable. When running the same again, it works fine.

Could it be that SQLite isn't behaving as it should? We have nearly 300000 tasks recorded, the DB file is 513M bytes in size.

What should be happening ?

No error ;-)

The relevant playbook

- name: Record playbook run in ara
  hosts: all

  tasks:
    - name: Get the currently running playbook
      ansible.legacy.ara_playbook:
      register: query
      delegate_to: localhost
      run_once: true
      tags: always

Hey @module0x90 and thanks for the issue.

The problem doesn't immediately ring me a bell and I'm not sure whether you would see it running mysql or postgresql.

Could you run ansible-playbook including -vvv so we can see the full exception traceback ? Otherwise ansible eats it.

Thanks.

I will try/relay that. It's just it is happening very infrequent and usually when you re-run exactly the same playbook, same parameters and/or hosts it doesn't happen again.

in my experience using SQlite is not stable and I can't recommend to use it for important production instances of Ara.
Errors like

Failed to patch on ...

appearing regularly for my Ara instance with SQlite. The Playbooks are then only partially reported in Ara (e.g. labels are missing).
My explanation is that the SQLite backend is sometise to busy to serve all requests and somehow dropping some things.
Iny production instances where I have MySQL such errors are not appearing.

There is an old open issue which could go in a similar direction: #116

Thank you @hille721.
You just encouraged me even more to migrate to a proper database.

Thank you @hille721. You just encouraged me even more to migrate to a proper database.

sqlite is good enough when running a single playbook at a time without enabling multi-threading in the callback.
My understanding is that multiple concurrent reads and writes can bump into each other while rows and tables can be locked.

https://ara.readthedocs.io/en/latest/troubleshooting.html#improving-playbook-recording-performance goes over some of this but we could probably add some words around concurrency and sqlite.