Monitor mode
Opened this issue · 3 comments
Which package is the feature request for? If unsure which one to select, leave blank
crawlee
Because I'm actively using PuppeteerCrawler
from crawlee
I might test it with that, so I'll focus to test using it first.
Feature
I migrated from puppeteer-cluster
to crawlee
, and I missed their monitor feature for local dev.
Motivation
It's handy to track time estimation.
Ideal solution or implementation, and any additional constraints
-
Consume and reuse existing statistic data of task completed and we will only add what's missing for the monitor, I don't currently know which file is it. But I'm sure RequestQueue and Concurrency features have this data.
-
Imagined CLI UI:
Start: START_TIME
Now: CURENT_TIME (running for CONSUMED_TIME)
Progress: FINISHED / TOTAL_TASK (FINISHED_PERCENTAGE), failed: FAILED (FAILED_PERCENTAGE)
Remaining: ESTIMATED_TIME (SPEED)
Sys. load: CPU_LOAD / MEM_LOAD
Concurrencies: CONCURRENCY_INFO
CONCURRENCY_LIST
-
Add a new Monitor class in packages/core/src/monitor.ts to handle the display of the monitor UI. It will contain the logic to write into the output and logic to gather and calculate the monitor data.
-
Integrate the Monitor class into the BasicCrawler class in packages/basic-crawler/src/internals/basic-crawler.ts
-
The Monitor class tracks and displays time estimation and concurrency status in the CLI output at regular intervals as proposed UI template.
-
Updated the run function in packages/basic-crawler/src/internals/basic-crawler.ts to initialize and start the Monitor class.
Alternative solutions or implementations
No response
Other context
crawlee
already using built-inlog
, so to make sure this monitor output not overwrite the log, we should find out how to write monitor andlog
output in separate line.
Hello! Could you please elaborate what the monitor feature does? Or provide a link?
There is a gif in their readme https://github.com/thomasdondorf/puppeteer-cluster