This repo hosts development code used on the backend to support data ingestion into an ElasticSearch index for the SafeDocs File Observatory app.
This repo contains pre-ALPHA grade code for demonstration purposes only.
Some capabilities demonstrated within have been integrated into Apache Tika. Some have been spun off into standalone projects, e.g. commoncrawl-fetcher-lite.
The commoncrawl-fetcher module includes code that relies on GeoLite2 data created by MaxMind, available from https://www.maxmind.com.