tballison
File processing and search. Founder Rhapsode Consulting LLC. Chair/VP Apache Tika. Committer, Apache POI, PDFBox, Lucene/Solr, Nutch, OpenNLP.
Rhapsode Consulting LLC
Pinned Repositories
CC-MAIN-2021-31-PDF-UNTRUNCATED
commoncrawl-fetcher-lite
Simplified version of a common crawl fetcher
cord-19
Data munging for CORD-19
file-observatory
Single server/laptop grade file-observatory
lucene-addons
Standalone versions of LUCENE_5205 and other patches: SpanQueryParser, Concordance and Co-occurrence stats
mp4parser
A Java API to read, write and create MP4 files
quaerite
Search relevance evaluation toolkit
rhapsode
Advanced desktop search/corpus exploration prototype
SimpleCommonCrawlExtractor
Simple wrapper around IIPC Web Commons to take a literal warc.gz and extract standalone binaries
tika-gui-v2
Unofficial user interface for Apache Tika
tballison's Repositories
tballison/lucene-addons
Standalone versions of LUCENE_5205 and other patches: SpanQueryParser, Concordance and Co-occurrence stats
tballison/rhapsode
Advanced desktop search/corpus exploration prototype
tballison/mp4parser
A Java API to read, write and create MP4 files
tballison/tika
Mirror of Apache Tika
tballison/chorus
Towards an open source stack for e-commerce search
tballison/CommonCrawlDocumentDownload
A small tool with uses the CommonCrawl URL Index (currently the older one which was announced in 2013!) to download documents with certain file types for mass-testing of frameworks like Apache POI and Apache Tika
tballison/jmatio
JMatIO - Matlab's MAT-file I/O in JAVA
tballison/tika-2_0-client-examples
tballison/AGPL
Repo of AGPL licensed code -- nothing in here is connected/related to anything outside of this repo
tballison/deeplearning4j
Deeplearning4j, ND4J, DataVec and more - deep learning & linear algebra for Java/Scala with GPUs + Spark - From Skymind
tballison/elasticsearch
Open Source, Distributed, RESTful Search Engine
tballison/forbidden-apis
Policeman's Forbidden API Checker
tballison/iwana
tballison/james-mime4j
Mirror of Apache James Mime4j
tballison/java-bplist
A Java library for reading Apple bplists, based on the work of
tballison/logging-log4j2
Apache Log4j 2 is an upgrade to Log4j that provides significant improvements over its predecessor, Log4j 1.x, and provides many of the improvements available in Logback while fixing some inherent problems in Logback's architecture.
tballison/lucene-solr
Mirror of Apache Lucene + Solr
tballison/opennlp
Mirror of Apache OpenNLP
tballison/parso
lightweight Java library designed to read SAS7BDAT datasets
tballison/pdf.js
PDF Reader in JavaScript
tballison/pdfbox
Mirror of Apache PDFBox
tballison/rated-ranking-evaluator
Search Quality Evaluation Tool for Apache Solr & Elasticsearch search-based infrastructures
tballison/tabula-java
Extract tables from PDF files
tballison/tballison.github.io
tballison/wikiclean
A Java Wikipedia markup to plain text converter
tballison/xmpcore-shaded
Shaded version of Adobe's xmpcore to remove *.internal.* part of namespace
tballison/yalder
Yet another language detector