chaoss/grimoirelab

Scheduler to fetch data from Git repositories

Opened this issue · 0 comments

The current version of the platform needs of several instances for analyzing more than 5000 data sources. For example, for a project with around 3500 high-activity repositories that retrieves data from GitHub (commits, issues and pull requests), the platform needs 3 days to start analyzing new data.

The goal is to start working on a new scheduler that allows to scale the platform according to what's defined on the current version of the roadmap.

The first iteration of this scheduler will be to add support for fetching data from Git repositories.