Find the use and users of open-source code
GitDetective is an open-source web application for finding the use and users of open-source code. GitDetective achieves this by compiling open-source code with Kythe and indexing references found in the compiliation units. GitDetective then supresses internal references (referring to source code within the project) and extracts external references (refering to source code outside the project). This data is then collected and organized into a knowledge graph powered by Grakn.
GitDetective consists of two separate modules:
The indexer module contains source code builders and integration with Kythe to produce the external references to be imported into Grakn. The indexer scales horizontally, are deployed in a cluster, and are responsible for:
- cloning projects
- building projects with Kythe
- extracting Kythe data related to external references
- augmenting Kythe data with additional information
- filtering out definition/reference data already imported
- sending indexed project to web module for importing
indexer
-- Build/extract source code data
The web module contains a Bootstrap website which is generated by the Handlebars template engine and provides a way to view data collected by the indexer module. The web is deployed as a single instance and is responsible for:
- creating indexing jobs (based on user activity)
- importing indexed projects into Grakn
- caching imported definitions/references and unique Kythe paths
- displaying external references between projects
Coming soon
- Fork it (https://github.com/yourname/yourproject/fork)
- Create your feature branch (
git checkout -b feature/fooBar
) - Commit your changes (
git commit -am 'Add some fooBar'
) - Push to the branch (
git push origin feature/fooBar
) - Create a new Pull Request
Brandon Fergerson – @fergerson92 – brandon.fergerson@codebrig.com
This project was created to test Grakn and Kythe in the context of processing large amounts of source code. The idea came from the thought that it would be nice to really know how your source code is being used in the wild. Who uses function/feature X? Who's the biggest users/companies of function/feature X? Who was the first to use function/feature x? Etc.
If you would like to see GitDetective continue to comb the desert of open source code please consider sponsoring continued work on this project.