Parse patches on a per-language basis
damevski opened this issue · 3 comments
damevski commented
Short term goals:
- tokenize changesets (for each lang.)
- remove stop words (for each lang.)
and insert into Solr
damevski commented
We could add a few regular expressions specific to each language here. For instance, removing variable names if they are on the left side of an '=' sign in most imperative languages. It would help control the noise.
damevski commented
Use SrcML for now