Scripts that can be used to put information about commits from git servers into an elasticsearch instance.
Makes API calls to github.com and extracts some information about the public repos in the spotify organization and puts them in an xml file
Reads the json list of repos and clones all of them into a directory
extracts the changelogs from all repos in a directory and outputs a file in the elasticsearch bulk import format.
A schema that that fixes some obvious problems with tokenization of email addresses and project names
{
"settings":{
"index":{
"analysis":{
"analyzer":{
"analyzer_keyword":{
"tokenizer":"keyword",
"filter":"lowercase"
}
}
}
}
},
"mappings":{
"_default_":{
"properties":{
"committer-name":{
"analyzer":"analyzer_keyword",
"type":"string"
},
"committer-email":{
"analyzer":"analyzer_keyword",
"type":"string"
},
"project":{
"analyzer":"analyzer_keyword",
"type":"string"
}
}
}
}
}