versioneye/crawl_r

add Github crawler to fetch dependencies for Godep and Dep files

Opened this issue · 3 comments

It should try to fetch those project files directly via url and pull dependency details with help of parsers.

Currently running experiment, how many packages on Gosearch even are hosted on Github and how many them are using pkg-manager.

here's some statistics from the crawl of the first 5000pkg from go-search api:

"#-- Summary after 5000"
{:total=>5000, 
:is_github=>4717, 
:not_github=>282,
:no_file=>2930, 
:has_file=>1787, 
"godeps"=>617, "gopkg"=>118, "govendor"=>627, "glide"=>418, "gopm"=>7
}

seems that we could get a dependency details for ~35% packages with that crawler;

"#-- Summary after 10000"
{:total=>10000,
:is_github=>9460,
:no_file=>6089,
:not_github=>539,
:has_file=>3371,
"godeps"=>1190,
"gopkg"=>231,
"govendor"=>1159,
"glide"=>777,
"gopm"=>13,
"gopkg_lock"=>1}