This API returns the file count, the total number of lines and the total number of bytes grouped by file extension, of a given public Github repository. As in the example below:
[
{ "extension": "java",
"count": 4, "lines": 360,
"bytes": 5246
},
{
"extension": "xml",
"count": 2, "lines": 45,
"bytes": 368
},
…
]
In order to make a basic request to the API, all you need is a POST request to the root path of the application, with a payload such as the following:
{
"repository": "${GITHUB_REPO_NAME}"
}
The duration of the first request might take a while, depending on the size and structure of the given repository. After the first request of an especific repository, that response is stored in cache. That way, the next requests for that same repository will take no reprocessing. In order to invalidate that cache and reload the response, you just need to make the same POST request with an additional payload attribute:
{
"repository": "${GITHUB_REPO_NAME}",
"refresh": "true"
}
This repository contains a Dockerfile through which it is possible to generate a docker image for the API.
jonathandavidbr/github-scrapper
docker pull jonathandavidbr/github-scrapper:latest
docker run -p ${LOCAL_PORT}:5000 ${IMAGE_NAME}
Where LOCAL_PORT is the port in which the image must run, and IMAGE_NAME is the tag name given to the downloaded image.
This template is compatible with Auto DevOps. Auto Deploy of the master branch is enabled through configuration on Heroku.
This project is based on a GitLab Project Template.
- Performance improvements on requests and processing
- Auto cache invalidation based on Last Commit info
- Repository validation at request
- Visibility for repositories already stored
- Data persistence