vered1986/HypeNET

what is the form of Wikipedia ? xml or json or text?

Closed this issue · 1 comments

thanks. I just use the xml but it failed. it says "ValueError: Sentence boundaries unset. You can add the 'sentencizer' component to the pipeline with: nlp.add_pipe(nlp.create_pipe('sentencizer')) Alternatively, add the dependency parser, or set sentence boundaries by setting doc[i].sent_start "

Thanks for pointing this out! I didn't realize it's not described here: you first need to convert the XML to text using either this or this.