url_check.py cannot deal with unavailable sites
mglantz opened this issue · 5 comments
If a website is properly down the url_check.py plugin will bail out as shown below:
[mglantz@darkred event-driven-ansible]$ ansible-rulebook --rulebook website-automation.yml -i inventory --verbose
INFO:ansible_rulebook.app:Starting sources
INFO:ansible_rulebook.app:Starting rules
INFO:ansible_rulebook.engine:run_ruleset
INFO:ansible_rulebook.engine:ruleset define: {"name": "Listen for events on a webhook", "hosts": ["all"], "sources": [{"EventSource": {"name": "ansible.eda.url_check", "source_name": "ansible.eda.url_check", "source_args": {"urls": ["http://rhel9apache.sudo.net"], "delay": 10}, "source_filters": []}}], "rules": [{"Rule": {"name": "Web site is up", "condition": {"AllCondition": [{"EqualsExpression": {"lhs": {"Event": "url_check.status"}, "rhs": {"String": "up"}}}]}, "action": {"Action": {"action": "run_playbook", "action_args": {"name": "site_up.yml"}}}, "enabled": true}}, {"Rule": {"name": "Web site is down", "condition": {"AllCondition": [{"EqualsExpression": {"lhs": {"Event": "url_check.status"}, "rhs": {"String": "down"}}}]}, "action": {"Action": {"action": "run_playbook", "action_args": {"name": "site_down.yml"}}}, "enabled": true}}]}
INFO:ansible_rulebook.engine:load source
INFO:ansible_rulebook.engine:load source filters
INFO:ansible_rulebook.engine:Calling main in ansible.eda.url_check
INFO:ansible_rulebook.engine:Waiting for event from Listen for events on a webhook
ERROR:ansible_rulebook.engine:Source error
Traceback (most recent call last):
File "/usr/lib64/python3.10/site-packages/aiohttp/connector.py", line 986, in _wrap_create_connection
return await self._loop.create_connection(*args, **kwargs) # type: ignore[return-value] # noqa
File "/usr/lib64/python3.10/asyncio/base_events.py", line 1064, in create_connection
raise exceptions[0]
File "/usr/lib64/python3.10/asyncio/base_events.py", line 1049, in create_connection
sock = await self._connect_sock(
File "/usr/lib64/python3.10/asyncio/base_events.py", line 960, in _connect_sock
await self.sock_connect(sock, address)
File "/usr/lib64/python3.10/asyncio/selector_events.py", line 500, in sock_connect
return await fut
File "/usr/lib64/python3.10/asyncio/selector_events.py", line 535, in _sock_connect_cb
raise OSError(err, f'Connect call failed {address}')
ConnectionRefusedError: [Errno 111] Connect call failed ('192.168.120.130', 80)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/ansible_rulebook/engine.py", line 150, in start_source
await entrypoint(fqueue, args)
File "/home/mglantz/.ansible/collections/ansible_collections/ansible/eda/plugins/event_source/url_check.py", line 38, in main
async with session.get(url) as resp:
File "/usr/lib64/python3.10/site-packages/aiohttp/client.py", line 1138, in __aenter__
self._resp = await self._coro
File "/usr/lib64/python3.10/site-packages/aiohttp/client.py", line 535, in _request
conn = await self._connector.connect(
File "/usr/lib64/python3.10/site-packages/aiohttp/connector.py", line 542, in connect
proto = await self._create_connection(req, traces, timeout)
File "/usr/lib64/python3.10/site-packages/aiohttp/connector.py", line 907, in _create_connection
_, proto = await self._create_direct_connection(req, traces, timeout)
File "/usr/lib64/python3.10/site-packages/aiohttp/connector.py", line 1206, in _create_direct_connection
raise last_exc
File "/usr/lib64/python3.10/site-packages/aiohttp/connector.py", line 1175, in _create_direct_connection
transp, proto = await self._wrap_create_connection(
File "/usr/lib64/python3.10/site-packages/aiohttp/connector.py", line 992, in _wrap_create_connection
raise client_error(req.connection_key, exc) from exc
aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host rhel9apache.sudo.net:80 ssl:default [Connect call failed ('192.168.120.130', 80)]
INFO:ansible_rulebook.engine:Canceling all ruleset tasks
INFO:ansible_rulebook.app:Cancelling event source tasks
INFO:ansible_rulebook.app:Main complete
I'm not sure if this is meant to be, but it can be fixed by having some error handling for the aiohttp call.
Well, actually, the plugin cannot deal with any ClientError related issues, including SSL errors, etc.
I Agree. IMO this plugin can be improved adding the following capabilities:
- Consider networking issues as errors or events
- Certificates verification
- Custom retries and timeouts
- Expose the body response in the event data
- Follow redirects
I Agree. IMO this plugin can be improved adding the following capabilities:
- Consider networking issues as errors or events
- Certificates verification
- Custom retries and timeouts
- Expose the body response in the event data
- Follow redirects
AFAIKS, this fix would handle 1. and 2. and timeouts. Even though it doesn't make a distinction between an SSL error, a timeout and a connection failure. It's all thrown into ClientError. I note that in aio-libs/aiohttp#4064 there seems that there are discussions around dealing with specific exceptions in aiohttp, so perhaps not dealing with specific error types saves some work, if aiohttp is to be used going forward.
Something which struck me was that the event structure returned doesn't fit that well into dealing with connection related issues.
dict(
url_check=dict(
url=url,
status="down",
status_code=404,
)
)
)
Perhaps something like below instead would work better.
dict(
url_check=dict(
url=url,
status="down",
error_type="http|ssl|tcp"
)
In the suggested fix, I set status_code to 500, indicating internal server error, but that is IMHO forcing the shoe to fit.
I tested the initial patch with a firewall dropping traffic and the process being shut down and url_check now doesn't bail out, so "down" can trigger playbook which fixes firewalls and the web server process. I can imagine that you can use this to mitigate DDoS attacks or etc as well.
Closing as fix has been merged.