ValueError: max_df corresponds to fewer documents than min_df
Closed this issue · 3 comments
youssefavx commented
I'm getting this error when I try:
qrmine -i transcript.txt --topics --assign -n 3
Full trace:
QRMine(TM) Qualitative Research Miner. v3.4.0
Using TensorFlow backend.
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.7/bin/qrmine", line 8, in <module>
sys.exit(main_routine())
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/qrmine/main.py", line 381, in main_routine
cli() # run the main function
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/qrmine/main.py", line 75, in cli
generate_topics(data, assign, num)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/qrmine/main.py", line 182, in generate_topics
q.process_content()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/qrmine/nlp_qrmine.py", line 164, in process_content
self.load_matrix()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/qrmine/nlp_qrmine.py", line 195, in load_matrix
for documents in self._corpus.docs))
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/textacy/vsm/vectorizers.py", line 431, in fit_transform
doc_term_matrix = self._fit(tokenized_docs)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/textacy/vsm/vectorizers.py", line 492, in _fit
doc_term_matrix, vocabulary_terms
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/textacy/vsm/vectorizers.py", line 592, in _filter_terms
max_n_terms=self.max_n_terms,
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/textacy/vsm/matrix_utils.py", line 246, in filter_terms_by_df
raise ValueError("max_df corresponds to fewer documents than min_df")
ValueError: max_df corresponds to fewer documents than min_df
dermatologist commented
Is the transcript.txt correctly formatted?
youssefavx commented
Do you mean utf-8? Or something else? Do I need to add in some text?
dermatologist commented
Yes, basically it analyses interviews for qualitative insight. The interview transcript file is input. The format is as below:
Transcript of the first interview with John.
Any number of lines
First_Interview_John
Text of the second interview with Jane.
More text.
Second_Interview_Jane