CherioCrawler not working "allow running single crawler instance multiple times"

Question

CherioCrawler not working "allow running single crawler instance multiple times"

distributev opened this issue 3 months ago · 0 comments

distributev commented 3 months ago

Which package is this bug report for? If unsure which one to select, leave blank

@crawlee/cheerio (CheerioCrawler)

Issue description

I believe this is expected to work but it does not

allow running single crawler instance multiple times
#1844

If I try to run() in a loop the first iteration works fine but all the subsequent iterations display.

2024-08-26T22:00:05.502Z INFO CheerioCrawler: Starting the crawler.
2024-08-26T22:00:05.576Z INFO CheerioCrawler: All requests from the queue have been processed, the crawler will shut down.
2024-08-26T22:00:05.783Z INFO CheerioCrawler: Final request statistics: {"requestsFinished":0,"requestsFailed":0,"retryHistogram":[],"requestAvgFailedDurationMillis":null,"requestAvgFinishedDurationMillis":null,"requestsFinishedPerMinute":0,"requestsFailedPerMinute":0,"requestTotalDurationMillis":0,"requestsTotal":0,"crawlerRuntimeMillis":451}

Code sample

for (let i=0;i<100;i++) {
  console.time(`RUN (${i}) crawler.run`);
  
  await crawler.run(urls);
  
  await new Promise(resolve => setTimeout(resolve, 1000));
  
  console.timeLog(`RUN (${i}) crawler.run`);
}



### Package version

3.11.1

### Node.js version

20

### Operating system

_No response_

### Apify platform

- [X] Tick me if you encountered this issue on the Apify platform

### I have tested this on the `next` release

_No response_

### Other context

_No response_