apify/crawlee

Monitor mode

Opened this issue · 3 comments

Which package is the feature request for? If unsure which one to select, leave blank

crawlee

Because I'm actively using PuppeteerCrawler from crawlee I might test it with that, so I'll focus to test using it first.

Feature

I migrated from puppeteer-cluster to crawlee, and I missed their monitor feature for local dev.

Motivation

It's handy to track time estimation.

Ideal solution or implementation, and any additional constraints

  • Consume and reuse existing statistic data of task completed and we will only add what's missing for the monitor, I don't currently know which file is it. But I'm sure RequestQueue and Concurrency features have this data.

  • Imagined CLI UI:

Start: START_TIME
Now: CURENT_TIME (running for CONSUMED_TIME)
Progress: FINISHED / TOTAL_TASK (FINISHED_PERCENTAGE), failed: FAILED (FAILED_PERCENTAGE)
Remaining: ESTIMATED_TIME (SPEED)
Sys. load: CPU_LOAD / MEM_LOAD
Concurrencies: CONCURRENCY_INFO
CONCURRENCY_LIST
  • Add a new Monitor class in packages/core/src/monitor.ts to handle the display of the monitor UI. It will contain the logic to write into the output and logic to gather and calculate the monitor data.

  • Integrate the Monitor class into the BasicCrawler class in packages/basic-crawler/src/internals/basic-crawler.ts

  • The Monitor class tracks and displays time estimation and concurrency status in the CLI output at regular intervals as proposed UI template.

  • Updated the run function in packages/basic-crawler/src/internals/basic-crawler.ts to initialize and start the Monitor class.

Alternative solutions or implementations

No response

Other context

  • crawlee already using built-in log, so to make sure this monitor output not overwrite the log, we should find out how to write monitor and log output in separate line.

Hello! Could you please elaborate what the monitor feature does? Or provide a link?

Oh sorry forgot to insert link to source, Thanks for the help, here's the gif.