scrapy/scrapyd

How to specify the naming convention of log files?

aaronm137 opened this issue · 1 comments

Hello,

in my Scrapy spider, I specify the name of the log file as follows:

custom_settings = {
        'LOG_FILE': f'{datetime.fromtimestamp(time.time()).strftime("%Y-%m-%d_%H%M%S")}_{project_name}.log,
        'LOG_LEVEL': 'INFO'
}

so the name of the log file looks like this: 2024-05-23_103249_my-cool-spider.log. This works perfectly on localhost.

When I deploy it to production where Scrapyd takes case of running spider jobs, the log files naming convention specified above is ignored and instead of that is used Scrapyd's naming convention that looks like this: task_169_2024-06-14T13_55_48.log.

Is there any way to change the naming convention, so Scrapyd would respect the format specified in the Scrapy spider?

In https://github.com/scrapy/scrapyd/blob/master/scrapyd/environ.py, if logs_dir is set in Scrapyd's configuration file, then Scrapy's LOG_FILE setting is overridden. The pattern is {logs_dir value}/{Scrapyd project name}/{Scrapy spider name}/{job ID}.log

In https://github.com/scrapy/scrapyd/blob/master/scrapyd/webservice.py, the job ID defaults to a uuid.uuid().hex, but you can provide your own jobid when scheduling the job. See https://scrapyd.readthedocs.io/en/latest/api.html#schedule-json

So, if you only want to configure the filename, set the job ID when scheduling. task_169_2024-06-14T13_55_48.log is already not a UUID, so you (or some software you're using) must already be setting the jobid (or, perhaps you haven't set logs_dir and it is using the non-overridden LOG_FILE setting).