/gitcommit2es

Scripts to pull commit information from git logs into elasticsearch

Primary LanguagePython

Scripts that can be used to put information about commits from git servers into an elasticsearch instance.

publicrepos2json.py

Makes API calls to github.com and extracts some information about the public repos in the spotify organization and puts them in an xml file

cloner.py

Reads the json list of repos and clones all of them into a directory

gitcommit2esbulk.py

extracts the changelogs from all repos in a directory and outputs a file in the elasticsearch bulk import format.

A schema that that fixes some obvious problems with tokenization of email addresses and project names

{
 "settings":{
     "index":{
        "analysis":{
           "analyzer":{
              "analyzer_keyword":{
                 "tokenizer":"keyword",
                 "filter":"lowercase"
              }
           }
        }
     }
  },
  "mappings":{
     "_default_":{
        "properties":{
           "committer-name":{
              "analyzer":"analyzer_keyword",
              "type":"string"
           },
           "committer-email":{
              "analyzer":"analyzer_keyword",
              "type":"string"
           },
           "project":{
              "analyzer":"analyzer_keyword",
              "type":"string"
           }
        }
     }
  }
}