dbmdz/solr-ocrhighlighting

Multi-threaded highlighting

Closed this issue · 0 comments

Currently every (doc, field, matchOffset) combination is highlighted sequentially. Since highlighting is highly I/O-bound, it would be great if this could be parallelized at the doc- or field-level so we can take advantage of Storage-Layers that allow concurrent access (see e.g. #49).

This work should probably also involve a refactor that moves away from subclassing the uhighlight.FieldHighlighter type hierarchy and replaces it with something that is better suited to our use case. Specifically we should look at determining if there's a better way to determine passage boundaries than the current BreakIterator approach.