apify/actor-templates

Add templates using the Apify SDK for Python

fnesveda opened this issue · 4 comments

After we have the Apify SDK for Python implemented, we need to add actor templates that will use it.

There two main decisions / tasks:

  • which templates to add
    • plain
    • BeautifulSoup
    • Scrapy
    • Selenium
    • Playwright
    • others?
  • project structure - done in #118
    • should it use pyproject.toml?
    • where will the main script be defined? (equivalent to package.json : scripts : start)
    • should they use a virtual environment
      • would be really good for local running
      • not needed for running on platform, but won't hurt

Personally, I would rewrite the getting started one, which showcases the platform and its features and then I would add a crawling one, so probably Scrapy? Not sure if we need a beautiful soup one.

Other stuff I don't know, but if you say it's good, then we should probably have it 😅

It depends on whether we will have the nice template selector in the console soon, or not. If we don't, then I wouldn't add too many templates, because there's already too many and they already don't fit. If yes, then we can add more and have them all in some nice Python category.

@fnesveda Do you have a plan for how to do the Scrapy-to-actor mapping as their structure is more complicated than Crawlee's? They have multiple spiders per folder with top-level libraries if I recall so they would need to select the spider via input or somehow CD into it in Dockerfile after copying the libraries?

No plan yet. I think we can have the template simple, optimized just for one spider, and solve these complicated things in the scrapy migrator.

The Scrapy "multiple spiders per project" philosophy does not really align with the "do just one thing but do it well" UNIX (and actor) philosophy.