/Hello-Wiki

This Java library opens a Wikipedia dump in form of an XML file and extracts the content of the individual articles. Content refers to the article text, a list of the links to other Wikipedia articles and more. It can also create a Lucene index, adding the article contents into different fields.

Primary LanguageJavaGNU General Public License v3.0GPL-3.0

Stargazers

No one’s star this repository yet.