Add better docsting for texthero.nlp.noun_chunks
jbesomi opened this issue ยท 6 comments
texthero.nlp.noun_chunks docstring is very poor. It would be great to have a more informative documentation.
I made an attempt at this after reading the existing docstrings and the one mentioned here. Do you want me to paste the changes in this thread so that you can take a look at it, or should I submit a PR?
Hey @avinashbhat, as you prefer! Either way work as we will need to both review it and integrate the changes with a PR.
Alright then, how does this look?
Noun chunks or noun phrases are phrases that have noun at their head or nucleus i.e., they
contain the noun and other words that describe that noun.
Internally `noun_chunks` makes use of Spacy's dependency parsing.
Parameters
----------
input : Pandas Series
Returns
-------
Pandas Series
Examples
--------
>>> import texthero as hero
>>> import pandas as pd
>>> s = pd.Series("The monuments in New Delhi glorify the settler colonialism.")
>>> hero.noun_chunks(s)
0 [('The monuments', 'NP', 0, 13), ('New Delhi', 'NP', 17, 26), ('the settler colonialism', 'NP', 35, 58)]
dtype: object
Great job! Thank you!!
Some minor feedback:
- The docstring needs a heading title, separate by an empty line from the rest, look at some similar docstring for ideas
- We should better specify how the output pandas series look like (every cell is a list of tuple where each tuple is composed of ...). This can be done in the body of the docstring and therefore we can remove the "Return" section (we don't have this in any Texthero function), what do you think?
- A link to the spacy dependencig parsing doc might be useful as well
@jbesomi I have updated it based on your feedback. I was not sure what you meant by a heading title in the first suggestion, although I did modify it to look like the other docstring. Do let me know if this or anything else needs to be further changed. Thanks :)
Return noun chunks (noun phrases).
Return a Pandas Series where each row contains a tuple that has information regarding the noun chunk.
Tuple: (`chunk'text`, `chunk'label`, `starting index`, `ending index`)
Noun chunks or noun phrases are phrases that have noun at their head or nucleus
i.e., they ontain the noun and other words that describe that noun.
A detailed explanation on noun chunks: https://en.wikipedia.org/wiki/Noun_phrase
Internally `noun_chunks` makes use of Spacy's dependency parsing:
https://spacy.io/usage/linguistic-features#dependency-parse
Parameters
----------
input : Pandas Series
Examples
--------
>>> import texthero as hero
>>> import pandas as pd
>>> s = pd.Series("The monuments in New Delhi glorify the settler colonialism.")
>>> hero.noun_chunks(s)
0 [('The monuments', 'NP', 0, 13), ('New Delhi', 'NP', 17, 26), ('the settler colonialism', 'NP', 35, 58)]
dtype: object
That's amazing @avinashbhat, I encourage you to open a PR and update the docstring, will be glad to merge it!