sassoftware/sas-studio-custom-steps

SAS Job Fails due to possible error in airflow

twon007 opened this issue · 3 comments

[2023-10-15T20:02:01.265+0100] {logging_mixin.py:120} INFO - Waiting for Job to run to completion...
[2023-10-15T20:17:50.735+0100] {logging_mixin.py:120} INFO - HTTP Call failed with error "('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))" from /jobExecution/jobs/e6335a5e-dd62-4498-a14d-bea035299caf/state. Will set jobstatus=unknown and continue checking...
[2023-10-15T20:17:33.840+0100] {base_job.py:232} ERROR - LocalTaskJob heartbeat got an exception
Traceback (most recent call last):
And then a lot of trace info, please see the log for the details

[2023-10-16T02:16:44.084+0100] {taskinstance.py:1401} INFO - Marking task as SUCCESS. dag_id=GDW_PostETL, task_id=Check_Scheduled_SourceSystems, execution_date=20231015T010500, start_date=20231016T010504, end_date=20231016T011644
[2023-10-16T02:16:45.115+0100] {standard_task_runner.py:100} ERROR - Failed to execute job 595174 for task Check_Scheduled_SourceSystems ((psycopg2.errors.DeadlockDetected) deadlock detected
DETAIL: Process 1879381 waits for ShareLock on transaction 155630385; blocked by process 1845775.
Process 1845775 waits for ShareLock on transaction 155630384; blocked by process 1879381.

Looking at the two DAG-logs attached I see the following reference:

"Background on this error at: https://sqlalche.me/e/14/e3q8":


OperationalError

Exception raised for errors that are related to the database’s operation and not necessarily under the control of the programmer, e.g. an unexpected disconnect occurs, the data source name is not found, a transaction could not be processed, a memory allocation error occurred during processing, etc.

This error is a DBAPI Error and originates from the database driver (DBAPI), not SQLAlchemy itself.

The OperationalError is the most common (but not the only) error class used by drivers in the context of the database connection being dropped, or not being able to connect to the database. For tips on how to deal with this, see the section Dealing with Disconnects.

This indicates an error in the underlying metadatabase that Airflow uses (typically MySQL or Postgress). See https://towardsdatascience.com/sigterm-signal-fix-airflow-486ab704b126 for how to tweek the configuration of Airflow (requires Medium membership to view).

snlwih commented

@twon007 , could you please provide more context to explain how this is related to custom steps in SAS Studio? Issues related to the SAS Airflow Provider would need to be reported as a GitHub Issue on https://github.com/sassoftware/sas-airflow-provider.

From the info provided, am I correct in guessing that you have an Apache Airflow DAG that uses a SAS Airflow Operator (which of course executes SAS code on a SAS server). The SAS-provided Airflow operator tries to retrieve a status by calling the /jobExecution/jobs//state endpoint as shown in the partial dump of the log you included. It seems that while the operator performs that call, something goes wrong.

All of this indicates to me that this is an issue with either Apache Airflow and/or SAS Airflow Provider. If you agree with my guess and assessment, then once you have created a GitHub Issue in that repo, please post the URL for that issue in the comments of this thread ("to close the loop" 😃 ). I will then close this specific issue.

@twon007, do you have an update on this?

Closing. This does not seem to be SAS Studio Custom Step related and instructions have been provided to explain where to report SAS Airflow Provider related questions.