flaxsearch/luwak

The lack of hermetics in DocumentBatch

SOLR4189 opened this issue · 1 comments

Hi, I have an another problem:
When I passed my docs in batches (3000 docs in batch) through Monitor I don't get all matching pairs. When I passed my docs in batches with one doc per batch I get all results. What can it be? Has LUWAK batch size limit? I didn't found...

I'm using ParallelMatcher with SimpleMatcher inside (score doesn't matter for me), in monitor loaded one query only.

Ok. I found a problem. The problem is that DocumentBatch gets analyzers from first document in batch only (line 187 in DocumentBatch.java). So, it will failed in the case when another doc in batch has fields that first doc doesn't have.

Temp solution: when I build batch, I collect all analyzers from all docs in batch, so each doc in batch will get all possible analyzers for all possible fields (even those that it doesn't have)

Optimal solution: DocumentBatch must union all analyzers itself. What do you think?