Add templates using the Apify SDK for Python
fnesveda opened this issue · 4 comments
After we have the Apify SDK for Python implemented, we need to add actor templates that will use it.
There two main decisions / tasks:
- which templates to add
- plain
- BeautifulSoup
- Scrapy
- Selenium
- Playwright
- others?
- project structure - done in #118
- should it use
pyproject.toml
? - where will the main script be defined? (equivalent to
package.json : scripts : start
) - should they use a virtual environment
- would be really good for local running
- not needed for running on platform, but won't hurt
- should it use
Personally, I would rewrite the getting started one, which showcases the platform and its features and then I would add a crawling one, so probably Scrapy? Not sure if we need a beautiful soup one.
Other stuff I don't know, but if you say it's good, then we should probably have it 😅
It depends on whether we will have the nice template selector in the console soon, or not. If we don't, then I wouldn't add too many templates, because there's already too many and they already don't fit. If yes, then we can add more and have them all in some nice Python category.
@fnesveda Do you have a plan for how to do the Scrapy-to-actor mapping as their structure is more complicated than Crawlee's? They have multiple spiders per folder with top-level libraries if I recall so they would need to select the spider via input or somehow CD into it in Dockerfile after copying the libraries?
No plan yet. I think we can have the template simple, optimized just for one spider, and solve these complicated things in the scrapy migrator.
The Scrapy "multiple spiders per project" philosophy does not really align with the "do just one thing but do it well" UNIX (and actor) philosophy.