Fetch single repo programmatically instead of crawling all
bonndan opened this issue · 2 comments
Summary
In Java I want the crawler to fetch a single repo and return the findings for it as response.
Type of Issue
It is a :
- bug
- request
- question regarding the documentation
Motivation
In my usecase I have a repo url and I want all the data from the crawler for that url
Current Behavior
As far as I have understood from the getting started project, I can only launch the crawler and then have the output filled somehow (I totally didn't get that).
Expected Behavior
GitHubCrawlerOutput output = crawler.getOutputFor(myRepoUrl);
I don't think you'll be able to do that directly. the entry point is here : https://github.com/societe-generale/github-crawler/blob/master/github-crawler-core/src/main/kotlin/com/societegenerale/githubcrawler/GitHubCrawler.kt#L32 . what you would be interested in is probably fetchAndParseRepoContent(repositoriesFromOrga: Set) method, but it's private.
But I think there's one possibility for you :
-
the list of repositories is first fetched here .
-
So what you could do is implement a very basic version of RemoteSourceControl , or even subclass one of the existing implementations, and simply override fun fetchRepositories(organizationName: String): Set, so that it returns a Set of a single element, ie the repository you are interested in.
-
you can then build a GitHubCrawler instance, passing this custom RemoteSourceControl and the rest of the configuration, then call
crawl()
on it, and see where that gets you.
This issue can be closed. I don't need single repo fetching anymore, but thanks a lot for the kind support.