jbesomi/texthero

AttributeError: 'str' object has no attribute 'pipe'

rahul4tripathi2 opened this issue · 1 comments

Hi Team,

I'm trying to run texthero on top of cleaned text but getting error when running below code

from goose3 import Goose
from requests import get
import pandas as pd

response = get('https://www.autoindustria.com.br/2020/10/15/simposio-sae-online-destaca-cases-de-sucesso-em-manufatura/')
extractor = Goose({'browser_user_agent': 'Mozilla', 'parser_class':'soup'})
#extractor = Goose()
article = extractor.extract(raw_html=response.content)
text = article.cleaned_text
text = hero.clean(text)

ERROR


AttributeError Traceback (most recent call last)
in
----> 1 text = hero.clean(text)

c:\users\rahultripathi7\miniconda3\envs\4p-sensing\lib\site-packages\texthero\preprocessing.py in clean(s, pipeline)
386
387 for f in pipeline:
--> 388 s = s.pipe(f)
389 return s
390

AttributeError: 'str' object has no attribute 'pipe'

Hi,

As per the API documentation, clean requires as input a Pandas Series. You are passing it a string.

To solve your issue, you need first to convert text into a Pandas Series. Something like that should do the work:

# split("\n") spit the article text by new line
s_text = pd.Series(text.split("\n")) 
s_text = hero.clean(s_text)
...

Hope it help. I'm gonna close that issue now, if you need further assistance just reopen the issue.
Regards,