praetorian-inc/gato

Speed Up Enumeration

AdnaneKhan opened this issue · 1 comments

This issue tracks a mid-term goal of improving Gato's scanning flow to speed up enumeration.

Current State:

Gato takes a long time to run for organizations with hundreds or more repositories.

Challenges:

  • Gato makes a lot of REST API calls. For large organizations, many of these calls do not return valuable information. This slows things down a lot.
  • The workflow run log download step is very slow. Gato short circuits if it finds a self-hosted runner, but if there is not one, and there are very complex runs Gato will download and extract zip files that are megabytes in size (currently 10 per repository). This makes running gato without the skip run-logs flag useless against large organizations.

Possible Solutions:

  • Incremental output of results + save/resume functionality. Gato currently enumerates everything while storing results in memory. It then converts everything to JSON and writes it (if the JSON output flag is enabled). If we incrementally store progress, operators can pause/resume the enumeration of a large organization.
  • GraphQL initial filter pass: Many of Gato's API calls return 404s (such as running a call to list repository contents for the .github/workflows directory for a repository that does not have one. If Gato starts off with GraphQL queries to retrieve repositories and all metadata relevant to further checks, Gato can disable checks if we know they will just come back empty.
  • For organization enumeration, only download run logs if there is an affirmative self-hosted runner ID from workflow file analysis. We can increase the scanning depth for single repo mode and even disable short-circuiting to enumerate all accessible runners.

Definition of Done: Enumerating a large organization with thousands of repositories is not painful.

Merged into dev after confirming integration tests pass in feature branch. Will keep an eye on behavior and aim for a merge into main after any emergent bugs are fixed.