teamclairvoyant/airflow-maintenance-dags

Log cleanup missing dag_processor_manager.log

Closed this issue · 3 comments

The log cleanup DAG works great, but it misses the dag_processor_manager.log file. I'm not sure if this file is new in Airflow or not. The configuration for it was added in 1.10.2, but the file might have existed earlier. Unlike the other log files, this one is just a standard, sequential log. It lives in the dag_processor_manager directory, but is not nested within any subdirectories named after the date it was run, like the scheduler and individual DAGs are. It's possible to use logrotate to manage this log file, but it would be nice if everything were together on one place.

Hey Eric,

Does the dag_processor_manager.log file reside under the base_log_folder directory as defined in airflow.cfg? I ask this because I just tested out the process on my machine and it seems to remove the old files under the directory as well. I am running 1.10.9.

Assuming base_log_folder = /var/log/airflow, the DAG should delete all files under /var/log/airflow/*/* (i.e. /var/log/airflow/dag_processor_manager/dag_processor_manager.log).
You might want to check that those files are older than 30 days since that's the default value for days to check (DEFAULT_MAX_LOG_AGE_IN_DAYS).

Hi @prakshalj0512. Yes, it lives at [base_log_folder]/dag_processor_manager/dag_processor_manager.log. I was surprised at the size of that file - 50MB on a system that has one DAG and hasn't been running for very long.

I agree that this log file will, most likely, get deleted. It appears to be within the search path for log files. The problem is that this is a log file that gets continuously appended to. Each line in the file is timestamped, but the file entire doesn't get rotated by date. What I think will happen is that it will get created when Airflow is first run, then get deleted every 30 days (or whatever is configured as the max age). Airflow will, most likely, create a new one and carry on. Thirty days later, that one will get deleted. In general, this log file will always have less data in it than what you want to retain. It's not a huge deal. I added a line to the DAG that truncates the log file to the last 10,000 rows. I've never had to look into that log file, so I figured just having a bit of data is probably sufficient.

That sounds like a handy adjustment to the existing workflow. Yes, I've never also looked at it either. If you don't want to deal with it, I believe leaving dag_processor_manager_log_location in airflow.cfgblank and that should stop populating it. From my experience, a new dag_processor_manager.log.{1,2,3,...} gets generated once the file size hits 100mb. I guess we can suggest that as a feature on Airflow Github to make that file size configurable. I'll go ahead and close the issue for now.