Add better docsting for texthero.nlp.noun_chunks

Question

Add better docsting for texthero.nlp.noun_chunks

jbesomi opened this issue 5 years ago · 6 comments

texthero.nlp.noun_chunks docstring is very poor. It would be great to have a more informative documentation.

Answer 1 · 2020-07-11T10:25:17.000Z

I made an attempt at this after reading the existing docstrings and the one mentioned here. Do you want me to paste the changes in this thread so that you can take a look at it, or should I submit a PR?

Answer 2 · 2020-07-11T10:27:41.000Z

Hey @avinashbhat, as you prefer! Either way work as we will need to both review it and integrate the changes with a PR.

Answer 3 · 2020-07-11T10:30:57.000Z

Alright then, how does this look?

    Noun chunks or noun phrases are phrases that have noun at their head or nucleus i.e., they 
    contain the noun and other words that describe that noun.

    Internally `noun_chunks` makes use of Spacy's dependency parsing.

    Parameters
    ----------
    input : Pandas Series

    Returns
    -------
    Pandas Series
    
    Examples
    --------
    >>> import texthero as hero
    >>> import pandas as pd
    >>> s = pd.Series("The monuments in New Delhi glorify the settler colonialism.")
    >>> hero.noun_chunks(s)
    0    [('The monuments', 'NP', 0, 13), ('New Delhi', 'NP', 17, 26), ('the settler colonialism', 'NP', 35, 58)]
    dtype: object

Answer 4 · 2020-07-11T10:38:35.000Z

Great job! Thank you!!

Some minor feedback:

The docstring needs a heading title, separate by an empty line from the rest, look at some similar docstring for ideas
We should better specify how the output pandas series look like (every cell is a list of tuple where each tuple is composed of ...). This can be done in the body of the docstring and therefore we can remove the "Return" section (we don't have this in any Texthero function), what do you think?
A link to the spacy dependencig parsing doc might be useful as well

Answer 5 · 2020-07-11T13:05:47.000Z

@jbesomi I have updated it based on your feedback. I was not sure what you meant by a heading title in the first suggestion, although I did modify it to look like the other docstring. Do let me know if this or anything else needs to be further changed. Thanks :)

    Return noun chunks (noun phrases).

    Return a Pandas Series where each row contains a tuple that has information regarding the noun chunk.
    
    Tuple: (`chunk'text`, `chunk'label`, `starting index`, `ending index`)

    Noun chunks or noun phrases are phrases that have noun at their head or nucleus 
    i.e., they ontain the noun and other words that describe that noun. 
    A detailed explanation on noun chunks: https://en.wikipedia.org/wiki/Noun_phrase
    Internally `noun_chunks` makes use of Spacy's dependency parsing:
    https://spacy.io/usage/linguistic-features#dependency-parse

    Parameters
    ----------
    input : Pandas Series
    
    Examples
    --------
    >>> import texthero as hero
    >>> import pandas as pd
    >>> s = pd.Series("The monuments in New Delhi glorify the settler colonialism.")
    >>> hero.noun_chunks(s)
    0    [('The monuments', 'NP', 0, 13), ('New Delhi', 'NP', 17, 26), ('the settler colonialism', 'NP', 35, 58)]
    dtype: object

Answer 6 · 2020-07-13T12:18:11.000Z

That's amazing @avinashbhat, I encourage you to open a PR and update the docstring, will be glad to merge it!