Questionable behavior of find_synonyms()
Closed this issue · 13 comments
with the following code generic_utils.py
:
from wordhoard import Synonyms
def synonym(word):
syn = Synonyms(word)
syn_res = syn.find_synonyms()
return syn_res
ran from terminal with clean state:
>>> import generic_utils as gu
>>> gu.synonym('mother')
['ma', 'mom', 'mum', 'dam', 'mama', 'mater', 'mommy', 'mummy', 'mamma', 'mammy', 'momma', 'parent', 'para i', 'supermom',
'puerpera', 'old lady', 'old woman', 'primipara', 'quadripara', 'quintipara', 'birth mother', 'mother-in-law',
'foster mother', 'female parent', 'surrogate mother', 'biological mother']
>>> gu.synonym('mother')
['noun']
env info with wordhoard==1.5.3
and python 3.10.10
:
backoff==2.2.1
beautifulsoup4==4.12.2
certifi==2022.12.7
charset-normalizer==3.1.0
cloudscraper==1.2.71
deckar01-ratelimit==3.0.2
deepl==1.14.0
idna==3.4
lxml==4.9.2
pyparsing==3.0.9
requests==2.28.2
requests-toolbelt==1.0.0
soupsieve==2.4.1
urllib3==1.26.15
Seems to be some error with caching; once was able to get some error message, but not 100% sure that this is it.
ERROR:wordhoard.synonyms:A KeyError occurred in the following code segment:
ERROR:wordhoard.synonyms: File "/<path>/.conda/envs/sam/lib/python3.10/site-packages/wordhoard/synonyms.py", line 571, in _query_thesaurus_com
self._update_cache(part_of_speech_category, synonyms_list)
File "/<path>/.conda/envs/sam/lib/python3.10/site-packages/wordhoard/synonyms.py", line 134, in _update_cache
caching.insert_word_cache_synonyms(self._word, pos_category, synonyms)
File "/<path>/.conda/envs/sam/lib/python3.10/site-packages/wordhoard/utilities/caching.py", line 65, in insert_word_cache_synonyms
temporary_dict_synonyms[word][pos_category] += deduplicated_values
Was able to fix behaviour, when disabling caching totally by changing the line
wordhoard/wordhoard/synonyms.py
Line 208 in 1e54f45
check_cache = [False]
I have never used generic_utils
, so I need to look into what they do.
this concerns me:
gu.synonym('mother')
['noun']
and I need to create a python 3.10.10 environment to see what breaks.
The the reason that I use caching is to prevent redundant queries for words.
I will look into this a get back to you.
I have never used generic_utils, so I need to look into what they do.
The code in generic_utils.py
is provided in the issue. It is just a wrapper in the first code cell.
The the reason that I use caching is to prevent redundant queries for words.
I understand, but this is a quick and dirty fix until the caching works.
Where I execute this code in Python 3.9.16
I get no errors in wordhoard_error.yaml
from wordhoard import Synonyms
def synonym(word):
syn = Synonyms(word)
syn_res = syn.find_synonyms()
return syn_res
words = ['mother', 'mother']
for word in words:
results = synonym(word)
print(results)
Hmmm, I check wordhoard_error.yaml
with the provided code, with no errors, but still the same behavior with python 3.10.10
(sadly would be thought to change as already have a lot of different libraries for this version):
['ma', 'mum', 'dam', 'mom', 'mama', 'mommy', 'momma', 'mater', 'mammy', 'mamma', 'mummy', 'parent', 'para i', 'old lady', 'puerpera', 'supermom', 'primipara', 'old woman', 'quadripara', 'quintipara', 'birth mother', 'female parent', 'mother-in-law', 'foster mother', 'surrogate mother', 'biological mother']
['noun']
Check with the debugger,
wordhoard/wordhoard/synonyms.py
Line 210 in 1e54f45
check_cache[1]
has the following value:
{'noun': ['mom', 'parent', 'female parent', 'momma', 'mama', 'mammy', 'mommy', 'ma', 'mom', ...]}
That is where the noun
comes from
Yes, noun
is the part_of_speech. I need to create a Python 3.10.10
to see what errors I get.
So the problem is related to Python 3.10.10
. I need to rework the code to support Python 3.10.10
. I will post an update here when I have fixed this issue.
@johnbumgarner I have made a PR. It does seem questionable why you don't pass the synonyms
variable that is passed in 2 other format types.
what does this mean it does seem questionable why you don't pass the synonyms variable that is passed in 2 other format types.
wordhoard/wordhoard/synonyms.py
Lines 207 to 222 in 1e54f45
In the provided snippet of the current version of
find_synonyms
.We can observe that we have 3 code paths based on
self._output_format
.Only in the list case do we provide
check_cache[1]
. In the JSON and dict return types, we return the newly created synonyms
variable, which already holds the required information.
Maybe fixing this would make it work with Python 3.10 while keeping it backward compatible?
This is precisely what I change in the ac1b33b PR.
I believe that the issue in the code below, because the dictionary
and json
code works.
if check_cache[0] is True:
part_of_speech = list(check_cache[1].keys())[0]
synonyms = cleansing.flatten_multidimensional_list(list(check_cache[1].values()))
if self._output_format == 'list':
return sorted(set([word.lower() for word in check_cache[1]]))
Totally agree, the issue is in check_cache[1]
, instead should be synonyms
, as discussed in the previous comment and the PR.
This is correct. I see that I need to do more testing before pushing out a new release. Thanks for finding this bug.