DFKI/leechcrawler
Incremental crawling capabilities for Apache Tika. Crawl content out of e.g. file systems, http(s) sources (webcrawling) imap(s) servers or your own arbitrary data sources. LeechCrawler offers additional Tika parsers providing these crawling capabilities.
JavaBSD-3-Clause
Issues
- 2
- 1
- 1
Possible memory leak
#8 opened - 5
- 1
Exception while crawling doc file
#3 opened - 1
Tika 1.8 support
#2 opened - 1
Tika 1.7 support
#1 opened