treversec/Hello-Wiki
This Java library opens a Wikipedia dump in form of an XML file and extracts the content of the individual articles. Content refers to the article text, a list of the links to other Wikipedia articles and more. It can also create a Lucene index, adding the article contents into different fields.
JavaGPL-3.0
Stargazers
No one’s star this repository yet.