yandexdataschool/nlp_course

Some code is not working due to new versions of libraries

Extremesarova opened this issue · 1 comments

Hi!
Talking about nlp_course/week01_embeddings/seminar.ipynb:
This row "Requirements: pip install --upgrade nltk gensim bokeh , but only if you're running locally." will install the latest versions of libraries, because you didn't specify exact versions.
I suggest to specify exact versions of libraries you intended to use in your notebooks.
As of May 2021, gensim has version 4.0.1
It means that

words = sorted(model.vocab.keys(), 
               key=lambda word: model.vocab[word].count,
               reverse=True)[:1000]

will not work.
Better to replace it with

words = sorted(model.key_to_index.keys(), 
               key=lambda word: model.get_vecattr(word, "count"),
               reverse=True)[:1000]

Talking about nlp_course/week01_embeddings/homework.ipynb:

precision_top1 = precision(uk_ru_test, mapping.predict(X_test), 1)
precision_top5 = precision(uk_ru_test, mapping.predict(X_test), 5)

assert precision_top1 >= 0.635
assert precision_top5 >= 0.813

And here it works only with this fix precision_top5 >= 0.811 (probably due to new gensim library as well)

P.S. I will update this issue with new problems as I go through the course.

  1. the outdated call to model.vocab should indeed be updated. See PR here.

  2. 0.813 is actually achievable. Keep trying or see the course chats. )