explosion/spaCy

Noun chunking missing root nouns?

jmugan opened this issue · 5 comments

When I run the noun chunker over the phrase "100 tacos with a side of rice" it returns "a side" and "rice" but not "100 tacos".

The word "tacos" has the dependency label (dep_) of "ROOT". I think the problem may be that english_noun_chunks in spacy.syntax.iterators has 'root' as lowercase. See line 5

labels = ['nsubj', 'dobj', 'nsubjpass', 'pcomp', 'pobj',
              'attr', 'root']

When I add 'ROOT' to labels it works as expected and returns "100 tacos". Of course, there may be a reason that 'root' is different from 'ROOT' that I am not aware of.

Definitely a bug — thanks for the report.

If you have time, would you mind submitting a pull request with a test and the patch?

Will do!

I put in the change and the test, but when I try to build locally with python setup.py build_ext --inplace I get the error listed at http://9.media.readthedocs.io/projects/spacy/builds/3893054/

Broken build is sadness — I want to make sure this doesn't happen in future.

Anyway. The problem is fixed for 1.0. Thanks again for the report.

lock commented

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.