teamclairvoyant/airflow-maintenance-dags

Where to set restore_load_context=True?

Closed this issue · 2 comments

We have a large number of dags running and xcom table is being populated nearly every second. As a result, we receive the below error when using the xcom cleanup, immediately after the "INITIAL QUERY" output is printed. I have checked around but am unable to figure out where to place the restore_load_context=True parameter to no longer receive this warning. The cleanup does eventually happen on the xcom, but only after about 50,000+ warning rows are printed to the log

[2021-03-15 18:42:07,377] {logging_mixin.py:112} WARNING - /home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/orm/loading.py:342: SAWarning: Loading context for <BaseXCom at 0x7fdd1e4a9a00> has changed within a load/refresh handler, suggesting a row refresh operation took place. If this event handler is expected to be emitting row refresh operations within an existing load or refresh operation, set restore_load_context=True when establishing the listener to ensure the context remains unchanged when the event handler completes.

@aspain
I had the same issue only happening for xcom task. For me it helped to set PRINT_DELETES = False in the DAG. The conditional branch for this output here is running entries_to_delete = query.all(), which is printing multiple rows you've mentioned while consuming a lot of RAM trying to get all records. If you set it to False, the records will be deleted just fine, but you won't get an output for every row you're about to delete.

@gulyash unfortunately I did already have PRINT_DELETES = False set in the DAG. Since we had so many (millions) of rows being deleted, I didn't want to clog up the logs with that many lines. So I still received that error unfortunately. Because of this I've switched to a more straight-forward, mysql-only setup (I have yet to understand sqlalchemy to this extent)