BlackLab is a corpus retrieval engine built on top of Apache Lucene. It allows fast, complex searches with accurate hit highlighting on large, tagged and annotated, bodies of text. It was developed at the Institute of Dutch Lexicology (INL) to provide a fast and feature-rich search interface on our historical and contemporary text corpora.
We're also working on BlackLab Server, a web service interface to BlackLab, so you can access it from any programming language. See the ALPHA version here: https://github.com/INL/BlackLab-server
BlackLab is licensed under the Apache License 2.0.
More information:
- List of features
- Try an online demo
- Frequently Asked Questions
- Build and test it yourself (it's easy, promise!)
- The example application explained in detail
- Browse the Javadoc (or build the most recent one from source)
- BlackLab blog
- Follow @BlackLabINL on Twitter!
- For technical questions, contact Jan Niestadt (jan.niestadt@inl.nl)