Move most of the Scrapy template's logic to Apify SDK
honzajavorek opened this issue · 3 comments
I think some logic from __main__.py could be moved to the SDK. I think new_configure_logging
could be a decorator which is just imported from Apify SDK. I think configure_logger
and logger names could be imported.
Similarly, main.py contains _get_scrapy_settings
, but it could also be just imported, if turned into this:
def apply_apify_settings(settings: Settings, proxy_config: dict | None = None):
...
return settings
Then it wouldn't need to call get_project_settings()
and would leave space for custom modifications before or after applying Apify-specific settings.
The main benefit of doing this is that the template contains less boilerplate and it's easier to control and maintain updates. If new Scrapy version is published and the logic of the monkey patching or anything else needs to be changed, I could just upgrade Apify SDK with updates. As of now the only way is to watch updates to the template manually and update my code by copy-pasting.
Scrapy is a library and I can upgrade it carefully or pin it to a certain version, but Apify is a SaaS platform. If something changes in Apify and it won't be compatible with the old template code, it can just break my actors out of nowhere. Only then I will be prompted to go and see if the template looks differently than last time. Not ideal.
Follow up to #132, vaguely related to apify/actor-templates#264
I did some changes in my implementation so that it's more tidied up:
- Introduced CLI where only the logging setup is hoisted before everything else: https://github.com/juniorguru/plucker/blob/6fe0c31097b00339cbc05b2ab40fc1dae23160bd/juniorguru_plucker/cli.py
- Moved all logging setup to a separate file: https://github.com/juniorguru/plucker/blob/main/juniorguru_plucker/loggers.py
- I moved monkey-patching to a decorator - not sure myself if this is nicer or not, but won't go and rewrite it again now: https://github.com/juniorguru/plucker/blob/6fe0c31097b00339cbc05b2ab40fc1dae23160bd/juniorguru_plucker/loggers.py#L38
- I moved actor/spider interplay to a separate file, but I guess this is way too custom and far away from what is the aim of the template: https://github.com/juniorguru/plucker/blob/6fe0c31097b00339cbc05b2ab40fc1dae23160bd/juniorguru_plucker/actors.py
Feel free to grab inspiration from what I did, or even chunks of code (MIT licensed, just mention my name). I guess I've solved this for myself now. If there are updates to the Scrapy template, I hope I'll be able to somehow keep up with it and backport changes to my highly customized project.
Hi Honza, thank you for opening this. Moving as much code as we can from the template to the SDK is definitely a good way to go. Unfortunately, adding new features to our Scrapy-Apify integration is not a priority for this quarter, so I cannot promise I'll have time to take a look at this in the near future.