dermatologist/nlp-qrmine

ValueError: max_df corresponds to fewer documents than min_df

Closed this issue · 3 comments

I'm getting this error when I try:

qrmine -i transcript.txt --topics --assign -n 3

Full trace:

QRMine(TM) Qualitative Research Miner. v3.4.0
Using TensorFlow backend.
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.7/bin/qrmine", line 8, in <module>
    sys.exit(main_routine())
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/qrmine/main.py", line 381, in main_routine
    cli()  # run the main function
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/qrmine/main.py", line 75, in cli
    generate_topics(data, assign, num)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/qrmine/main.py", line 182, in generate_topics
    q.process_content()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/qrmine/nlp_qrmine.py", line 164, in process_content
    self.load_matrix()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/qrmine/nlp_qrmine.py", line 195, in load_matrix
    for documents in self._corpus.docs))
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/textacy/vsm/vectorizers.py", line 431, in fit_transform
    doc_term_matrix = self._fit(tokenized_docs)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/textacy/vsm/vectorizers.py", line 492, in _fit
    doc_term_matrix, vocabulary_terms
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/textacy/vsm/vectorizers.py", line 592, in _filter_terms
    max_n_terms=self.max_n_terms,
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/textacy/vsm/matrix_utils.py", line 246, in filter_terms_by_df
    raise ValueError("max_df corresponds to fewer documents than min_df")
ValueError: max_df corresponds to fewer documents than min_df

Is the transcript.txt correctly formatted?

Do you mean utf-8? Or something else? Do I need to add in some text?

Yes, basically it analyses interviews for qualitative insight. The interview transcript file is input. The format is as below:

Transcript of the first interview with John.
Any number of lines
First_Interview_John

Text of the second interview with Jane.
More text.
Second_Interview_Jane