Improve/surface errors when attempting to read S3 logs
ashb opened this issue · 5 comments
Description
As mentioned in #8212 if you have configured S3 logs, but there is a problem then this is never surfaced to the UI (nor the webserver logs) making this very hard to debug.
All you see in the UI is this:
*** Log file does not exist: /usr/local/airflow/logs/MY_DAG_NAME/MY_TASK_NAME/2020-04-07T20:59:19.312402+00:00/6.log
*** Fetching from: http://MY_DAG_NAME-0dde5ff5a786437cb14234:8793/log/MY_DAG_NAME/MY_TASK_NAME/2020-04-07T20:59:19.312402+00:00/6.log
*** Failed to fetch log file from worker. HTTPConnectionPool(host='MY_DAG_NAME-0dde5ff5a786437cb14234', port=8793): Max retries exceeded with url: /log/MY_DAG_NAME/MY_TASK_NAME/2020-04-07T20:59:19.312402+00:00/6.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f708332fc90>: Failed to establish a new connection: [Errno -2] Name or service not known'))
In one such case I was debugging, I found this error when attempting to communicate with S3:
>>> from airflow.configuration import conf
[2020-06-03 08:26:00,253] {settings.py:254} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=11106
>>> from airflow.hooks.S3_hook import S3Hook
>>> h = S3Hook(aws_conn_id=conf.get('core', 'remote_log_conn_id'))
>>> c = h.get_conn()
>>> c.list_buckets()
[2020-06-03 08:27:24,662] {connectionpool.py:735} INFO - Starting new HTTPS connection (1): s3.amazonaws.com
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.7/site-packages/botocore/client.py", line 314, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/usr/lib/python3.7/site-packages/botocore/client.py", line 612, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (InvalidAccessKeyId) when calling the ListBuckets operation: The AWS Access Key Id you provided does not exist in our records.
We should add a ***
line showing we attempted to fetch the logs form S3, and the error that it failed at least. Right now on error the S3TaskHandler is totally silent in case of error. This is bad.
Any clue what component can be touched to impact on this? I'm having trouble to find what code swallows exceptions. Really looking to investigate and resolve S3 logs issue.
I used code from the example above and I was able to list log files in my s3 bucket written by Airflow, no permission issues, no errors. However, Airflow seams to be ignoring remote logging configuration when reads logs. Also, used for testing airflow_local_settings.py and set a logging class path in config file, but no luck, S3TaskHandler seams to be ignored by Airflow.
OS: fedora26
python: 3.7.5
Airflow: 1.10.10
AWS EC2 instance with proper role permissions.
@ashb @JPonte I am getting below error. Looks like a bug:
>>> from airflow.configuration import conf
>>> from airflow.hooks.S3_hook import S3Hook
>>> h = S3Hook(aws_conn_id=conf.get('core', 'remote_log_conn_id'))
>>> c = h.get_conn()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/hooks/S3_hook.py", line 44, in get_conn
return self.get_client_type('s3')
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/contrib/hooks/aws_hook.py", line 176, in get_client_type
session, endpoint_url = self._get_credentials(region_name)
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/contrib/hooks/aws_hook.py", line 102, in _get_credentials
connection_object = self.get_connection(self.aws_conn_id)
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/hooks/base_hook.py", line 84, in get_connection
conn = random.choice(list(cls.get_connections(conn_id)))
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/hooks/base_hook.py", line 80, in get_connections
return secrets.get_connections(conn_id)
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/secrets/__init__.py", line 52, in get_connections
conn_list = secrets_backend.get_connections(conn_id=conn_id)
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/secrets/base_secrets.py", line 69, in get_connections
conn = Connection(conn_id=conn_id, uri=conn_uri)
File "<string>", line 4, in __init__
File "/home/airflow/.local/lib/python3.6/site-packages/sqlalchemy/orm/state.py", line 433, in _initialize_instance
manager.dispatch.init_failure(self, args, kwargs)
File "/home/airflow/.local/lib/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 69, in __exit__
exc_value, with_traceback=exc_tb,
File "/home/airflow/.local/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 178, in raise_
raise exception
File "/home/airflow/.local/lib/python3.6/site-packages/sqlalchemy/orm/state.py", line 430, in _initialize_instance
return manager.original_init(*mixed[1:], **kwargs)
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/models/connection.py", line 119, in __init__
self.parse_from_uri(uri)
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/models/connection.py", line 144, in parse_from_uri
self.port = uri_parts.port
File "/usr/local/lib/python3.6/urllib/parse.py", line 169, in port
port = int(port, 10)
ValueError: invalid literal for int() with base 10: 'abcd12'
My aws secret key is(dummy): abcd12/ef34578fgt
Looks like when '/' is coming in the secret key, the connection is not getting created.
@Siddharthk you probably need to URL encode the secret key - /
to %2f