/2demo-S3TikaOcrNerMongodbAtlas

Maven (java-based) example of text extraction and text mining. Files are hosted on S3. Text contents are extracted by Apache Tika (along with Tesseract for OCR if needed). Text is mined by OpenNLP named entity recognition. The output is stored in MongoDB Atlas.

Primary LanguageJava

This repository is not active