apify/crawlee
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
TypeScriptApache-2.0
Issues
- 0
- 0
Make references in code documentation clickable inside IDE
#2717 opened by tobice - 0
- 3
- 9
- 1
bug: sitemap parser returning invalid URLs
#2698 opened by barjin - 0
Crawlee fingerprintOptions do not completely reflect fingerprint-suite's options
#2703 opened by galaczi - 2
API tab leads to error on main homepage.
#2702 opened by souravjain540 - 6
CheerioCrawler not persisting cookies
#2618 opened by taythebot - 10
Build fails for playwright crawlee initial setup
#2693 opened by jensmeichler - 3
- 3
Implement max crawl depth
#2633 opened by janbuchar - 2
Error Crawlee + Cheerio Apify Typescript template
#2691 opened by LouisDeconinck - 3
Monitor mode
#2680 opened by ImBIOS - 0
Session retries don't trigger the `errorHandler`
#2678 opened by B4nan - 5
Revisit API of storages
#2674 opened by janbuchar - 0
bug: `SitemapRequestList.persistState()` throws when sitemap loading has finished
#2672 opened by barjin - 1
- 4
HTTP client switching
#2659 opened by B4nan - 4
- 1
remove all enums
#2654 opened by ryanleecode - 0
Failed to prolong lock for cached request.
#2653 opened by matrs - 0
Request timeout in docker
#2648 opened by Laxy317 - 1
URL with hash (fragment) in Puppeteer fails
#2647 opened by roboncode - 0
CherioCrawler not working "allow running single crawler instance multiple times"
#2634 opened by distributev - 1
crawlee playwright bun : Running crawlee's playwright crawler with bun causes Protocol mismatch error
#2627 opened by ImBIOS - 0
feat: `SitemapRequestList` network error retries
#2617 opened by barjin - 0
how to handle timeout error
#2621 opened by zy783282949 - 0
Support for recordVideo from Playwright
#2615 opened by spaceworkplatform - 1
RequestQueue2 debug log a lot "Failed to delete request lock for request"
#2582 opened by slow-groovin - 2
Frequent errors when working with node-schedule
#2612 opened by faner11 - 0
- 0
Support SOCKS proxy for CheerioCrawler
#2599 opened by sushantdhiman - 1
experimentalContainers navigation "always" timeout
#2597 opened by AraCoders - 0
- 0
Form POST request returns error page using BasicCrawler, but works when using `node-fetch`
#2586 opened by Hamza5 - 1
Error: Invalid "proxyUrl" option: only HTTP proxies are currently supported
#2580 opened by mehrdad-shokri - 1
enqueueLinksByClickingElements function lacks both "exclude" and "limit" options
#2568 opened by AraCoders - 0
Large threaded, kubernetes scrape = Target page, context or browser has been closed
#2560 opened by JoshuaPerk - 1
- 2
- 2
Browser crashes after 25 requests
#2540 opened by amanzrx4 - 1
Cannot use multiple 'PlaywrightCrawlers' simultaneously
#2549 opened by tocha688 - 0
I am unable to restore a saved storageState
#2547 opened by tocha688 - 1
HTTP Crawler Memory Leak?
#2541 opened by ikeg225 - 2
SessionPool's throws memory leark warning and hangs playwright crawler
#2536 opened by harm-matthias-harms - 1
- 3
- 1
- 0
Crawlee forces me to install full puppeteer eventough I am giving it the executablePath
#2510 opened by ali-habibzadeh