/parser

Primary LanguageGoApache License 2.0Apache-2.0

Github data processor with TopK output

A readme on how to run the solution

How to download:

  • Checkout/download this set of files to any folder
  • This repo already contains data folder with input files - keep it to avoid unnecessary actions

How to run

make build

After that binary at path ./bin/app will be created. Following commands depend on this binary.

Top 10 active users sorted by amount of PRs created and commits pushed

This is default scenario. It can be executed by following:

make top1
  >  Top 10 active users sorted by amount of PRs created and commits pushed...
LombiqBot                 1529
renovate[bot]             535
pull[bot]                 384
direwolf-github           341
lihkg-boy                 331
ripamf2991                311
renovate-bot              232
otiny                     222
dependabot[bot]           183
dependabot-preview[bot]   155

Top 10 repositories sorted by amount of commits pushed

make top2
  >  Top 10 repositories sorted by amount of commits pushed...
lihkg-backup/thread                      331
otiny/up                                 222
ripamf2991/ntdtv                         167
ripamf2991/djy                           139
wessilfie/wessilfie.github.io            108
Lombiq/Orchard                           96
himobi/hotspot                           90
wigforss/java-8-base                     87
geos4s/18w856162                         79
SmartThingsCommunity/SmartThingsPublic   68

Top 10 repositories sorted by amount of watch events

make top3
  >  Top 10 repositories sorted by amount of watch events...
victorqribeiro/isocity                44
GitHubDaily/GitHubDaily               11
neutraltone/awesome-stock-resources   11
sw-yx/spark-joy                       10
imsnif/bandwhich                      8
Chakazul/Lenia                        7
BurntSushi/xsv                        7
neeru1207/AI_Sudoku                   6
ErikCH/DevYouTubeList                 6
testerSunshine/12306                  6

To use custom parameters, run ./bin/app. The binary is created with cobra CLI, so help is available:

Usage:
  app top [flags]

Flags:
      --entity_entity_column_index int       
      --entity_file string                    (default "./data/actors.csv")
      --entity_name_column_index int          (default 1)
      --event_types strings                   (default [PushEvent,PullRequestEvent])
      --events_entity_column_index int        (default 2)
      --events_event_type_column_index int    (default 1)
      --events_file string                    (default "./data/events.csv")
  -h, --help                                 help for top
      --k uint32                              (default 10)

How it works

Structure

  • main.go that executes root command of cobra.
  • ./command/top.go - top command which is root subcommand responsible for all top functionality. Top command is responsible to collect flag values and call CsvParserApp. CsvParserAppBuilder is needed to replace build logic in tests.
  • ./app/csv.go - processing logic. Consists of 3 function calls described below.

Processing logic

  • getIDsTop initializes HeavyKeeper instance, reads events file and on calls HeavyKeeper on every matching line. Returns top on K (10 by default) leader pairs of entity ID + count.
  • getNamedTop uses leaders list from previous step to prepare named list. Names are being read from entity file (actors.csv or repos.csv). All top items are iterated for every entity file line until all names are found or EOF.
  • WriteResults writes leaderboard list to io.Writer implementation passed to CsvParserApp on build stage.