nickgogan/2demo-S3TikaOcrNerMongodbAtlas
Maven (java-based) example of text extraction and text mining. Files are hosted on S3. Text contents are extracted by Apache Tika (along with Tesseract for OCR if needed). Text is mined by OpenNLP named entity recognition. The output is stored in MongoDB Atlas.
Java
No issues in this repository yet.