INL/BlackLab

Compound files fail with custom stored field (content store) in integrated index format

Closed this issue · 1 comments

If you enable compound files using IndexWriterConfig.setUseCompoundFile(true), or even if you disable them but Lucene decides to use them anyway while merging segments, fetching document contents from a compound segment will fail with a message like java.nio.file.NoSuchFileException: /home/jan/int-projects/blacklab-data/data/standaard/index.INTEGRATED/_s.fdt.

The file it's talking about is where stored field data is normally stored in a non-compound segment (for compound segments, this data would be in the .cfs file). Of course, this is wrong for two reasons: one, because it's our custom stored field type that uses a different set of file extensions such as .docindex, .blocks, etc.; and two, because all data files should be in the compound file .cfs here.

The problem is probably in BlackLab40StoredFieldsWriter, and more specifically in the merge method. This doesn't forward the call to the delegate, for a good reason, but clearly something is not going right here. We should investigate further, ideally by comparing with someone doing something similar (although that might be difficult to find).

Turns out it was not in the merge method but in BlackLab40PostingsReader.getStoredFieldReader(), where the wrong Directory was passed to the fieldsReader() method, leading to the compound segment being treated as a regular one.