lfmatosm/embedded-topic-model

[BUG] Testing on a single document

MaazBinMusa opened this issue · 1 comments

Describe the bug
Testing on a single document results in a code crash

To Reproduce
Steps to reproduce the behavior:

  1. Train a model
  2. Try to test it on 1 document

Reproduction example
I copy pasted code from the readme.md example. The only difference was my train and test sets were not different. I just pulled 1 document from the train set and sent that as input [test_doc] to the transform function.

Ran into the same thing, I think that based on the documentation, (looking at v1.5.1), for the stop_words arg

If None, no stop words will be used. In this case, setting max_df to a higher value, such as in the range (0.7, 1.0), can automatically detect and filter stop words based on intra corpus document frequency of terms.

It might be the case that some corpuses are too small or something to automatically infer stop words. I'm just skipping that step in documents_without_stop_words.