rbcavanaugh/coreLexicon

Possible pronouns issue

Closed this issue · 1 comments

I just scored a Cinderella sample, and it looks like the app is only counting (for example) "she" because "her" was also produced in the sample, but not included in the list. I haven't checked if this is true for all pronouns, but wanted to go ahead and highlight, because "she" should accept "her, hers, herself" etc.

Good find.

Right now, the app converts each token in the text sting into lemmas using an estiablished dataset to find lemmas:

A dataset based on Mechura's (2016) English lemmatization list. This data set can be useful for join style lemma replacement of inflected token forms to their root lemmas. While this is not a true morphological analysis this style of lemma replacement is fast and typically still robust.

It returns the lemma "her" for the token "her."

The best thing is probably for us to build on this list with custom token-lemma pairs. If you have a list of custom-accepted tokens/lemmas for each stimulus, I can add those to this lemmatization list for the app - super easy.