vcu-swim-lab/KnowHows

Parse patches on a per-language basis

damevski opened this issue · 3 comments

Short term goals:

  • tokenize changesets (for each lang.)
  • remove stop words (for each lang.)

and insert into Solr

We could add a few regular expressions specific to each language here. For instance, removing variable names if they are on the left side of an '=' sign in most imperative languages. It would help control the noise.

Updated issue title to be more accurate, #15 ties into this.

Use SrcML for now