http://lucene.apache.org/core/5_0_0/core/index.html
https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemming
eventuellement : poids de mots -> http://web.eecs.utk.edu/research/lsi/corpa.html
base de synonyms : http://wordnetcode.princeton.edu/3.0/WNprolog-3.0.tar.gz