what is the form of Wikipedia ? xml or json or text?

Question

what is the form of Wikipedia ? xml or json or text?

Closed this issue 7 years ago · 1 comments

chendi1995 commented 7 years ago

thanks. I just use the xml but it failed. it says "ValueError: Sentence boundaries unset. You can add the 'sentencizer' component to the pipeline with: nlp.add_pipe(nlp.create_pipe('sentencizer')) Alternatively, add the dependency parser, or set sentence boundaries by setting doc[i].sent_start "

Answer 1 · 2018-03-08T07:15:07.000Z

Thanks for pointing this out! I didn't realize it's not described here: you first need to convert the XML to text using either this or this.