ansible/ansible-rulebook

Tests are failing on power (ppc64le arch rhel8/9):Assertion and timeout error

ashwinik30 opened this issue · 12 comments

  • ansible-rulebook version:
    version = '0.10.1'
    Executable location = /usr/local/bin/ansible-rulebook
    Drools_jpy version = 0.1.9
    Java home = /usr/lib/jvm/java-17-openjdk-17.0.4.1.1-6.el8.ppc64le
    Java version = 17.0.4.1
    Python version = 3.9.13 (main, Jun 15 2022, 11:39:48) [GCC 8.5.0 20210514 (Red Hat 8.5.0-13)]

  • Python version: 3.9.13 (main, Jun 15 2022, 11:39:48) [GCC 8.5.0 20210514 (Red Hat 8.5.0-13)]

  • Operating System:
    NAME="Red Hat Enterprise Linux"
    VERSION="8.7 (Ootpa)"

Also tested on
NAME="Red Hat Enterprise Linux"
VERSION="9.1.0 (Ootpa)"

*Arch :
ppc64le

Description

Building upstream repo on power machine which needs to be pass successfully.
Tried testing on different Os such as rhel8/rhel9 on power and faced same testcase failing result

Some of testcases are failing due to assertion error It should pass successfully with all 139 tests are passed.
power result : 19 failed, 120 passed, 2 xfailed in 506.72s (0:08:26)

(for s390x all tests are passing)
s390x result :
139 passed, 2 xfailed in 178.76s (0:02:58)

What I Did

Testing upstream repo on power machine (ppc64le)
clones repository is :
https://github.com/ansible/ansible-rulebook.git

Installed depedencies which are required from installation page
https://github.com/ansible/ansible-rulebook/blob/v0.10.1/docs/installation.rst


Summury of commands:

1. yum install java-17-openjdk-devel openssl-devel git wget tar python39-devel.ppc64le gcc rust cargo
2. wget https://dlcdn.apache.org/maven/maven-3/3.8.7/binaries/apache-maven-3.8.7-bin.tar.gz
3. tar -xzvf apache-maven-3.8.7-bin.tar.gz -C /opt/
4. ln -s /opt/apache-maven-3.8.7/bin/mvn /usr/local/bin/mvn
5. rm -f apache-maven-3.8.7-bin.tar.gz
6. git clone https://github.com/ansible/ansible-rulebook.git
7. cd ansible-rulebook
8. export JDK_HOME=/usr/lib/jvm/java-17-openjdk
9. export JAVA_HOME=/usr/lib/jvm/java-17-openjdk
10. pip3 install -e .
11. pip3 install -r requirements_test.txt
12. ansible-galaxy collection install git+https://github.com/ansible/event-driven-ansible
13. pip3 install pyparsing jsonschema websockets drools-jpy 
14. pytest -v -n auto

output:

INFO     tests.e2e.test_operators:test_operators.py:387 Running command: ansible-rulebook -i /root/ansible-rulebook/tests/e2e/files/inventories/default_inventory.yml -S /root/ansible-rulebook/tests/e2e/../sources --rulebook /root/ansible-rulebook/tests/e2e/files/rulebooks/operators/test_not_operator.yml

=========================== short test summary info ============================
FAILED tests/e2e/test_actions.py::test_actions_sanity - subprocess.TimeoutExp...
FAILED tests/e2e/test_operators.py::test_less_operators[lt] - AssertionError:...
FAILED tests/e2e/test_operators.py::test_less_operators[le] - AssertionError:...
FAILED tests/e2e/test_operators.py::test_greater_operators[ge_operator] - Ass...
FAILED tests/e2e/test_operators.py::test_greater_operators[gt_operator] - Ass...
FAILED tests/e2e/test_operators.py::test_ne_operator[str] - AssertionError: a...
FAILED tests/e2e/test_operators.py::test_ne_operator[int] - AssertionError: a...
FAILED tests/e2e/test_operators.py::test_defined_operator[str] - AssertionErr...
FAILED tests/e2e/test_operators.py::test_defined_operator[int] - AssertionErr...
FAILED tests/e2e/test_operators.py::test_defined_operator[nested] - Assertion...
FAILED tests/e2e/test_operators.py::test_contains_operator[contains_str_single]
FAILED tests/e2e/test_operators.py::test_contains_operator[contains_int_single]
FAILED tests/e2e/test_operators.py::test_contains_operator[not_contains_str_combined]
FAILED tests/e2e/test_operators.py::test_in_operator[int_combined_multiple_extravars]
FAILED tests/e2e/test_operators.py::test_in_operator[not_in_int_extravars] - ...
FAILED tests/e2e/test_operators.py::test_in_operator[not_in_str_extravars] - ...
FAILED tests/e2e/test_operators.py::test_in_operator[str_single] - AssertionE...
FAILED tests/e2e/test_operators.py::test_in_operator[not_in_str_single] - Ass...
FAILED tests/e2e/test_operators.py::test_not_operator - AssertionError: asser...
============ 19 failed, 120 passed, 2 xfailed in 506.72s (0:08:26) =============



@ashwinik30 Do you have any more details on the test failures?
Can you run this command outside in a shell and paste the stdout

ansible-rulebook -i /root/ansible-rulebook/tests/e2e/files/inventories/default_inventory.yml -S /root/ansible-rulebook/tests/e2e/../sources --rulebook /root/ansible-rulebook/tests/e2e/files/rulebooks/operators/test_not_operator.yml --debug

@mkanoor I can share log file with you

[root@ibm-p9z-27-lp26 ~]# ./test.sh
2023-02-06 11:56:50,248 - asyncio - DEBUG - Using selector: EpollSelector
2023-02-06 11:56:50,249 - ansible_rulebook.app - DEBUG - Loading rules from the file system /root/ansible-rulebook/tests/e2e/files/rulebooks/operators/test_not_operator.yml
2023-02-06 11:56:50,539 - ansible_rulebook.app - INFO - Starting sources
2023-02-06 11:56:50,539 - ansible_rulebook.app - INFO - Starting rules
2023-02-06 11:56:50,539 - ansible_rulebook.engine - INFO - run_ruleset
2023-02-06 11:56:50,629 - drools.ruleset - DEBUG - Creating Drools Ruleset
2023-02-06 11:56:51,921 - drools.ruleset - DEBUG - Ruleset Session ID : 1
2023-02-06 11:56:51,921 - ansible_rulebook.engine - INFO - ruleset define: {"name": "Test not (negation) operator", "hosts": ["all"], "sources": [{"EventSource": {"name": "generic", "source_name": "generic", "source_args": {"event_delay": 1, "payload": [{"test_bool": false, "test_str": "Hello there"}, {"test_int": 79}, {"test_float": 4.212}]}, "source_filters": []}}], "rules": [{"Rule": {"name": "Test negation round 1", "condition": {"AllCondition": [{"AndExpression": {"lhs": {"NegateExpression": {"Event": "test_bool"}}, "rhs": {"NegateExpression": {"OrExpression": {"lhs": {"EqualsExpression": {"lhs": {"Event": "test_str"}, "rhs": {"String": "Goodbye"}}}, "rhs": {"NotEqualsExpression": {"lhs": {"Event": "test_str"}, "rhs": {"String": "Hello there"}}}}}}}}]}, "actions": [{"Action": {"action": "run_playbook", "action_args": {"name": "./playbooks/print_event.yml", "json_mode": true}}}], "enabled": true}}, {"Rule": {"name": "Test negation round 2", "condition": {"AllCondition": [{"NegateExpression": {"AndExpression": {"lhs": {"GreaterThanExpression": {"lhs": {"Event": "test_int"}, "rhs": {"Integer": 80}}}, "rhs": {"LessThanExpression": {"lhs": {"Event": "test_int"}, "rhs": {"Integer": 79}}}}}}, {"NegateExpression": {"OrExpression": {"lhs": {"LessThanOrEqualToExpression": {"lhs": {"Event": "test_float"}, "rhs": {"Float": 3.14}}}, "rhs": {"GreaterThanOrEqualToExpression": {"lhs": {"Event": "test_float"}, "rhs": {"Float": 4.5}}}}}}]}, "actions": [{"Action": {"action": "run_playbook", "action_args": {"name": "./playbooks/print_event.yml", "json_mode": true}}}], "enabled": true}}]}
2023-02-06 11:56:51,921 - drools.dispatch - DEBUG - Establishing async channel
2023-02-06 11:56:51,948 - ansible_rulebook.engine - INFO - load source
2023-02-06 11:56:51,950 - ansible_rulebook.engine - INFO - load source filters
2023-02-06 11:56:51,951 - ansible_rulebook.engine - INFO - Calling main in generic
2023-02-06 11:56:51,952 - ansible_rulebook.engine - INFO - Start a playbook action runner for Test not (negation) operator
2023-02-06 11:56:51,952 - ansible_rulebook.engine - INFO - Waiting for actions on events from Test not (negation) operator
2023-02-06 11:56:51,952 - ansible_rulebook.engine - INFO - Waiting for events from Test not (negation) operator
2023-02-06 11:56:51,957 - ansible_rulebook.engine - DEBUG - Received event : {'test_bool': False, 'test_str': 'Hello there'}
2023-02-06 11:56:51 952 [drools-async-evaluator-thread] INFO org.drools.ansible.rulebook.integration.api.io.RuleExecutorChannel - Async channel connected
2023-02-06 11:56:51 970 [main] DEBUG org.drools.kiesession.agenda.DefaultAgenda - State was INACTIVE is now FIRING_ALL_RULES
2023-02-06 11:56:52 019 [main] INFO org.drools.ansible.rulebook.integration.api.rulesengine.RegisterOnlyAgendaFilter - Activation of effective rule "Test negation round 1" with facts: [Event DROOLS_PROTOTYPE with values = {test_bool=false, test_str=Hello there}]
2023-02-06 11:56:52 020 [main] DEBUG org.drools.kiesession.agenda.DefaultAgenda - State was FIRING_ALL_RULES is now HALTING
2023-02-06 11:56:52 020 [main] DEBUG org.drools.kiesession.agenda.DefaultAgenda - State was HALTING is now INACTIVE
2023-02-06 11:56:52,037 - drools.ruleset - DEBUG - Calling rule : Test negation round 1 in session: 1
2023-02-06 11:56:52,038 - ansible_rulebook.rule_generator - INFO - calling Test negation round 1
2023-02-06 11:56:52,038 - ansible_rulebook.engine - DEBUG - None
2023-02-06 11:56:52,038 - ansible_rulebook.engine - INFO - call_action run_playbook
2023-02-06 11:56:52,038 - ansible_rulebook.engine - INFO - substitute_variables [{'name': './playbooks/print_event.yml', 'json_mode': True}] [{'ansible_eda': {'event': {'test_bool': False, 'test_str': 'Hello there'}, 'fact': {'test_bool': False, 'test_str': 'Hello there'}}}]
2023-02-06 11:56:52,038 - ansible_rulebook.engine - INFO - action args: {'name': './playbooks/print_event.yml', 'json_mode': True}
2023-02-06 11:56:52,039 - ansible_rulebook.engine - INFO - Queue playbook/module/job template {'./playbooks/print_event.yml'} for running later
2023-02-06 11:56:52,039 - ansible_rulebook.builtin - INFO - running Ansible playbook: ./playbooks/print_event.yml
2023-02-06 11:56:52,039 - ansible_rulebook.builtin - DEBUG - private data dir /tmp/run_playbook8o3uyigz
2023-02-06 11:56:52,039 - ansible_rulebook.builtin - DEBUG - variables {'ansible_eda': {'event': {'test_bool': False, 'test_str': 'Hello there'}, 'fact': {'test_bool': False, 'test_str': 'Hello there'}}}
2023-02-06 11:56:52,041 - ansible_rulebook.builtin - DEBUG - project_data_file: None
2023-02-06 11:56:58,999 - ansible_rulebook.builtin - ERROR - Could not find a playbook for ./playbooks/print_event.yml from /root
2023-02-06 11:56:58,999 - ansible_rulebook.builtin - INFO - ruleset: Test not (negation) operator, rule: Test negation round 1
2023-02-06 11:56:58,999 - ansible_rulebook.builtin - INFO - Calling Ansible runner
2023-02-06 11:56:59,000 - ansible_rulebook.engine - DEBUG - Received event : {'test_int': 79}
2023-02-06 11:56:59 001 [main] DEBUG org.drools.kiesession.agenda.DefaultAgenda - State was INACTIVE is now FIRING_ALL_RULES
2023-02-06 11:56:59 028 [main] DEBUG org.drools.kiesession.agenda.DefaultAgenda - State was FIRING_ALL_RULES is now HALTING
2023-02-06 11:56:59 028 [main] DEBUG org.drools.kiesession.agenda.DefaultAgenda - State was HALTING is now INACTIVE
2023-02-06 11:56:59,030 - ansible_rulebook.engine - DEBUG - None
2023-02-06 11:57:00,031 - ansible_rulebook.engine - DEBUG - Received event : {'test_float': 4.212}
2023-02-06 11:57:00 037 [main] DEBUG org.drools.kiesession.agenda.DefaultAgenda - State was INACTIVE is now FIRING_ALL_RULES
2023-02-06 11:57:00 039 [main] INFO org.drools.ansible.rulebook.integration.api.rulesengine.RegisterOnlyAgendaFilter - Activation of effective rule "Test negation round 2" with facts: [Event DROOLS_PROTOTYPE with values = {test_int=79}, Event DROOLS_PROTOTYPE with values = {test_float=4.212}]
2023-02-06 11:57:00 039 [main] DEBUG org.drools.kiesession.agenda.DefaultAgenda - State was FIRING_ALL_RULES is now HALTING
2023-02-06 11:57:00 040 [main] DEBUG org.drools.kiesession.agenda.DefaultAgenda - State was HALTING is now INACTIVE
2023-02-06 11:57:00,070 - drools.ruleset - DEBUG - Calling rule : Test negation round 2 in session: 1
2023-02-06 11:57:00,070 - ansible_rulebook.rule_generator - INFO - calling Test negation round 2
2023-02-06 11:57:00,070 - ansible_rulebook.engine - DEBUG - None
2023-02-06 11:57:00,071 - ansible_rulebook.engine - INFO - call_action run_playbook
2023-02-06 11:57:00,071 - ansible_rulebook.engine - INFO - substitute_variables [{'name': './playbooks/print_event.yml', 'json_mode': True}] [{'ansible_eda': {'events': {'m_1': {'test_float': 4.212}, 'm_0': {'test_int': 79}}, 'facts': {'m_1': {'test_float': 4.212}, 'm_0': {'test_int': 79}}}}]
2023-02-06 11:57:00,077 - ansible_rulebook.engine - INFO - action args: {'name': './playbooks/print_event.yml', 'json_mode': True}
2023-02-06 11:57:00,077 - ansible_rulebook.engine - INFO - Queue playbook/module/job template {'./playbooks/print_event.yml'} for running later
2023-02-06 11:57:01,077 - ansible_rulebook.engine - DEBUG - Received event : Shutdown(message='Source generic initiated shutdown at 2023-02-06 11:57:01.077428', delay=0.0, source_plugin='generic')
2023-02-06 11:57:01,077 - ansible_rulebook.engine - INFO - Shutdown message received: Shutdown(message='Source generic initiated shutdown at 2023-02-06 11:57:01.077428', delay=0.0, source_plugin='generic')
2023-02-06 11:57:01,087 - ansible_rulebook.engine - INFO - Attempt to stop ruleset Test not (negation) operator
2023-02-06 11:57:01,088 - ansible_rulebook.engine - INFO - Stopped waiting for actions on events from Test not (negation) operator
ERROR! the playbook: ./playbooks/print_event.yml could not be found
2023-02-06 11:57:06,173 - ansible_rulebook.engine - INFO - Playbook action runner Test not (negation) operator exits
2023-02-06 11:57:06 174 [main] DEBUG org.drools.kiesession.agenda.DefaultAgenda - State was INACTIVE is now DISPOSED
2023-02-06 11:57:06,175 - ansible_rulebook.engine - INFO - Stopped waiting on events from Test not (negation) operator
2023-02-06 11:57:06,175 - ansible_rulebook.engine - INFO - Canceling all ruleset tasks
2023-02-06 11:57:06,175 - ansible_rulebook.app - INFO - Cancelling event source tasks
2023-02-06 11:57:06,175 - drools.dispatch - DEBUG - Shutting down async channel
2023-02-06 11:57:06,178 - ansible_rulebook.app - INFO - Main complete
[root@ibm-p9z-27-lp26 ~]#

@mkanoor

Hello @ashwinik30 these tests are sensible to the performance. Is it possible that the machine is being emulated and not a real powerpc hardware?

Could you try to run pytest without the -n auto flag and increasing the timeout here?

Hi @Alex-Izquierdo ,
Thanks for input
I have tried out increasing timeout to 60 here
Also with only pytest command
But getting same failure as earlier

hey @Alex-Izquierdo on your below query

Is it possible that the machine is being emulated and not a real powerpc hardware?

So above failures are seen on native ppc64le node (RHEL8,RHEL9) acquired from RH's internal provisioning tooling - beaker . Also since you mentioned these are performance oriented tests - can you please confirm on the disk , ram, cpu's config which you think are required for these tests to complete its performance evaluation ? we could spin up large config based ppc64le virtual machine in beaker to retry these failing ones..

Hi @ghatwala @ashwinik30
Ansible-rulebook has not high demand for memory, with 1G memory should be more than enough. The main workload is for CPU.
I'm pretty sure it's only a performance issue. I was able to reproduce it in an emulated VM. An emulated ppc64 has poor performance because QEMU requires a lot of cpu to emulate the ISA. It's not only related with the timeout of the process. The main process ends as soon as one of the plugins ends. When the performance is low, the event producer terminates before the main process has enough time to process all events and that's why some tests fails. I don't know about that tool, beaker but if it is a real PPC64 hardware, the cpu seems have poor performance as an emulated one. Our e2e tests are environment dependent and for sure must be more flexible to deal with this cases. I'm looking how spawn my own ppc instance to be sure that all tests pass.

If you try to increase the value of this line: https://github.com/ansible/ansible-rulebook/blob/main/tests/e2e/files/rulebooks/actions/test_actions_sanity.yml#L6
in addition to increase the timeout mentioned earlier, and then run pytest -m e2e -k actions the test passes.

FYI
With the changes https://github.com/ansible/ansible-rulebook/blob/main/tests/e2e/files/rulebooks/actions/test_actions_sanity.yml#L6 from 1 =>8
and [here](https://github.com/ansible/ansible-rulebook/blob/main/tests/e2e/test_operators.py#L12) from 15 =>60
The testcases are failing with same error
We are investigating further..

cc @ghatwala

FYI
We have created a new ppc64le beaker vm with higher configurations we are getting same failures as previous

--------------------------------------------------------------------- Captured log call ---------------------------------------------------------------------
INFO     tests.e2e.test_operators:test_operators.py:214 Running command: ansible-rulebook -i /root/ansible-rulebook/tests/e2e/files/inventories/default_inventory.yml -S /root/ansible-rulebook/tests/e2e/../sources --rulebook /root/ansible-rulebook/tests/e2e/files/rulebooks/operators/test_is_defined_operator_nested.yml
================================================================== short test summary info ==================================================================
FAILED tests/e2e/test_operators.py::test_contains_operator[contains_str_single] - AssertionError: assert 1 == 3
FAILED tests/e2e/test_operators.py::test_in_operator[not_in_int_extravars] - AssertionError: assert 1 == 2
FAILED tests/e2e/test_operators.py::test_in_operator[not_in_str_extravars] - AssertionError: assert 1 == 3
FAILED tests/e2e/test_operators.py::test_defined_operator[int] - AssertionError: assert 1 == 5
FAILED tests/e2e/test_operators.py::test_ne_operator[int] - AssertionError: assert 2 == 7
FAILED tests/e2e/test_operators.py::test_greater_operators[gt_operator] - AssertionError: assert 1 == 3
FAILED tests/e2e/test_operators.py::test_contains_operator[not_contains_str_combined] - AssertionError: assert 1 == 2
FAILED tests/e2e/test_operators.py::test_less_operators[lt] - AssertionError: assert 2 == 4
FAILED tests/e2e/test_operators.py::test_less_operators[le] - AssertionError: assert 2 == 4
FAILED tests/e2e/test_operators.py::test_in_operator[int_combined_multiple_extravars] - AssertionError: assert 1 == 2
FAILED tests/e2e/test_operators.py::test_contains_operator[contains_int_single] - AssertionError: assert 1 == 3
FAILED tests/e2e/test_operators.py::test_not_operator - AssertionError: assert 1 == 2
FAILED tests/e2e/test_operators.py::test_greater_operators[ge_operator] - AssertionError: assert 1 == 3
FAILED tests/e2e/test_operators.py::test_in_operator[str_single] - AssertionError: assert 1 == 2
FAILED tests/e2e/test_operators.py::test_in_operator[not_in_str_single] - AssertionError: assert 2 == 3
FAILED tests/e2e/test_actions.py::test_actions_sanity - subprocess.TimeoutExpired: Command 'ansible-rulebook -i /root/ansible-rulebook/tests/e2e/files/inventories/default_inventory.yml -S /root/ansible-rulebo...
FAILED tests/e2e/test_operators.py::test_defined_operator[str] - AssertionError: assert 2 == 5
FAILED tests/e2e/test_operators.py::test_ne_operator[str] - AssertionError: assert 2 == 7
FAILED tests/e2e/test_operators.py::test_defined_operator[nested] - AssertionError: assert 2 == 5
=================================================== 19 failed, 135 passed, 2 xfailed in 95.82s (0:01:35) ====================================================
[root@ibm-p9b-23 ansible-rulebook]# arch
ppc64le
[root@ibm-p9b-23 ansible-rulebook]#

with the changes suggested by @mkanoor @Alex-Izquierdo Tests for passed on power
tests/e2e/config/default.yml

run_playbook_shutdown_after: 15 
cmd_timeout: 60
    ```