SINr-Embeddings/sinr

Diachronic features

nicolasdugue opened this issue · 2 comments

Feature
Let suppose one has at hand a textual corpus with a split in distinct time periods. One may want to analyze how word embeddings change across time.

Describe the solution you'd like
We suggest an approach as follows :

  • train a SINr model on the whole corpus
  • train several SINr models, one for each of the corpus slices, using the communities detected on the whole corpus

Adding one to play with the model :

  • most stereotypic words variation

Based on the nearest neighbor variation, but adapted to evaluate how the stereotypes of dimension have changed between two models.

Adding a diachronic tool :

  • the difference vector between the same words in two sub corpora in the same reference model