societe-generale/github-crawler

Fetch single repo programmatically instead of crawling all

bonndan opened this issue · 2 comments

Summary

In Java I want the crawler to fetch a single repo and return the findings for it as response.

Type of Issue

It is a :

  • bug
  • request
  • question regarding the documentation

Motivation

In my usecase I have a repo url and I want all the data from the crawler for that url

Current Behavior

As far as I have understood from the getting started project, I can only launch the crawler and then have the output filled somehow (I totally didn't get that).

Expected Behavior

GitHubCrawlerOutput output = crawler.getOutputFor(myRepoUrl);

I don't think you'll be able to do that directly. the entry point is here : https://github.com/societe-generale/github-crawler/blob/master/github-crawler-core/src/main/kotlin/com/societegenerale/githubcrawler/GitHubCrawler.kt#L32 . what you would be interested in is probably fetchAndParseRepoContent(repositoriesFromOrga: Set) method, but it's private.

But I think there's one possibility for you :

  • the list of repositories is first fetched here .

  • So what you could do is implement a very basic version of RemoteSourceControl , or even subclass one of the existing implementations, and simply override fun fetchRepositories(organizationName: String): Set, so that it returns a Set of a single element, ie the repository you are interested in.

  • you can then build a GitHubCrawler instance, passing this custom RemoteSourceControl and the rest of the configuration, then call crawl() on it, and see where that gets you.

This issue can be closed. I don't need single repo fetching anymore, but thanks a lot for the kind support.