chrplr/openlexicon

potential bug report

alephpi opened this issue · 4 comments

image
Hi, I'm just curious that the first aurai exists in French?

chrplr commented

I'll keep reporting potential bugs I find in this issue. since I'm doing some data processing for my project, it's just a side task.

invari = re.compile('ADV|CON|PRE')
df_invari = df.loc[df.cgram.str.contains(invari)]
df_invari[df_invari['ortho'] != df_invari['lemme']]

gives me

ortho phon lemme cgram genre nombre freqlemlivres freqlivres infover
aujourd'hui oZuRd8i aujourd'huie ADV     0.14 0.14  
bons-cadeaux b§kado bon-cadeaux ADV     0.00 0.00  
c'est-à-dire sEtadiR c'est-à-diree ADV     0.07 0.07  
d'emblée d@ble d'embléee ADV     0.07 0.07  
n n ne ADV     13841.89 5.68  
n' n ne ADV     13841.89 6084.12  
re R2 r ADV     7.50 7.50  
y i yu ADV     0.27 0.27

The lemma seems not correct. (I suppose invariant words' lemma are themselves)

ortho phon lemme cgram genre nombre freqlemlivres freqlivres infover
e 2 2e ADJ     0.00 0.00  
e 2 58e ADJ     0.00 0.00  
e 2 7e ADJ     0.07 0.07

bug.csv
Here is a table of words whose lemma's cgram is not the same as its own. (I think the lemma should be a closed operation right?)