/txtorg

Software for preprocessing textual data in multiple languages for textual analysis.

Primary LanguagePythonMIT LicenseMIT

txtorg

txtorg is a Python-based utility that leverages Apache Lucene to facilitate text preprocessing and management. It outputs processed text in a variety of formats for use in a wide array of analytical software, including (but not limited to) the structural topic model. It scales to large corpora and has a graphical user interface that anyone can use. With Lucene, txtorg can support a wide range of languages.

For more information, including installation instructions, see http://txtorg.org/.