Cannot use multiple 'PlaywrightCrawlers' simultaneously
Closed this issue · 1 comments
tocha688 commented
Which package is this bug report for? If unsure which one to select, leave blank
@crawlee/playwright (PlaywrightCrawler)
Issue description
When multiple 'PlaywrightCrawlers' are enabled, if one task ends, it will cause the entire process to end the task
Code sample
const { PlaywrightCrawler } = require('crawlee');
// 定义爬虫的配置
const crawlerConfig = {
// 爬虫配置...
};
// 创建并启动第一个爬虫实例
const crawler1 = new PlaywrightCrawler(crawlerConfig);
crawler1.run(["https://amazon.com"]);
// 创建并启动第二个爬虫实例
const crawler2 = new PlaywrightCrawler(crawlerConfig);
crawler1.run(["https://amazon.com"]);
Package version
3.10.5
Node.js version
v20.13.1
Operating system
windows
Apify platform
- Tick me if you encountered this issue on the Apify platform
I have tested this on the next
release
No response
Other context
I think it may be caused by the
process.once('SIGINT', sigintHandler);
barjin commented
This might be caused by both crawlers sharing the same storage. You can tell Crawlee to use different storage backends with each crawler by supplying the optional second constructor parameter.
const crawler = new CheerioCrawler(
{
...crawlerOptions
},
+ new Configuration({
+ persistStorage: false,
+ })
);
In the Configuration
, you can:
- Either set up an memory-only crawl (by using
persistStorage: false
- this also causes the crawlers to not share the storage, as each crawler now uses a separate in-memory storage backend). - Use
storageClientOptions.localDataDirectory
to tell each crawler to save data to its own directory. - Use different
default(Dataset|KeyValueStore|RequestQueue)Id
options to tell each crawler to store its data in separate datasets / KVS / request queue.
Let us know whether this helped. Cheers!