dataiku/dataiku-contrib

[Sentence Embedding] Improve ELMO implementation

du-phan opened this issue · 0 comments

  • Replace list computation by matrix computation.
  • Add TF-IDF for ELMO.
  • Refactor ELMO helping functions for better integration to the code base.

@RedaAffane in your get_elmo_text_batches_sif you define max_sequence_length = 100 and then use that threshold to shorten the input data. Why is that needed ? The vocabulary distribution is thus not the same anymore before and after get_elmo_text_batches_sif, and given that we compute word_weight before it, the word weights are no longer correct (?)