cfiltnlp/pyiwn

Problem with accessing kannada language synsets

hlsrekha opened this issue · 3 comments

I'm facing problem with accessing kannada language synsets

from pyiwn import pyiwn
iwn = pyiwn.IndoWordNet('kannada')

for some words I'm able to get the synsets

print(iwn.synsets('ಗಂಡಸು'))
[Synset('ಮಾನವ.None.858')]

For the words ಮನೆ, ಮಾನವ, ಗುಡುಗುಡು
print(iwn.synsets('ಮನೆ'))
print(iwn.synsets('ಮಾನವ'))
print(iwn.synsets('ಗುಡುಗುಡು'))
I'm getting the following error:

File "<pyshell#11>", line 1, in
print(iwn.synsets('ಮನೆ'))
File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\site-packages\pyiwn\pyiwn.py", line 61, in synsets
pos = sp[2] if pos == None else pos
IndexError: list index out of range

same error for all the 3 words mentioned above. However, these words are present in all.kannada file.
I request you to help me resolve this issue.

Thanks and regards
Shashirekha

@hlsrekha: We are looking into this. We will try to resolve this issue as soon as possible. Sorry for the inconvenience caused.

@riteshpanjwani :

I tried analyzing pyiwn.py when I got the same error :

File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\site-packages\pyiwn\pyiwn.py", line 61, in synsets
pos = sp[2] if pos == None else pos
IndexError: list index out of range

for the following statements :
print(iwn.synsets('ಮನೆ'))
print(iwn.synsets('ಮಾನವ'))
print(iwn.synsets('ಗುಡುಗುಡು'))

In the following definition of synsets (in file pyiwn.py)

def synsets(self, word, pos=None):
synsets = []

# First part
words_file_name = 'all.{}'.format(self._lang) if pos == None else '{}.{}'.format(pos, self._lang)
with utils.read_file('{}/words/{}'.format(home, words_file_name)) as fo:
for line in fo:
sp = utils.clean_line(line)
if word == sp[1]:
synset_id = sp[0]
pos = sp[2] if pos == None else pos
break
# Second part
synset_file_name = 'all.{}'.format(self._lang) if pos == None else '{}.{}'.format(pos, self._lang)
with utils.read_file('{}/synsets/{}'.format(home, synset_file_name)) as fo:
for line in fo:
sp = utils.clean_line(line)
synset_data = utils.synset_data(sp, pos)
if word in synset_data[2]:
synset_id, head_word, lemma_names, pos, gloss, examples = synset_data[0], synset_data[1], synset_data[2], synset_data[3], synset_data[4], synset_data[5]
synsets.append(Synset(synset_id, head_word, lemma_names, pos, gloss, examples))
return synsets
First part of the code is redundant. We may atmost get synset_id and pos from this part of the code but will be overwritten in the next (second part) part of the code. The synset information is obtained from the second part of the code only. Even though the first part of the code reads the words from \words\all.kannada (in my case) they are not used further. So I removed the first part of the code and it is working fine. Now, I'm not getting any errors which I had mentioned earlier for words present in the file.

regards

Shashirekha

Hi Shashirekha,

I have completely revamped the inner workings of the library and have fixed these issues. I would recommend you to do a full clean reinstall:

pip uninstall pyiwn
pip install --upgrade pyiwn

And follow the steps in this examples notebook: https://github.com/riteshpanjwani/pyiwn/blob/master/examples/example.ipynb

Regards,
Ritesh