Try using a byte array in ArrayHitCounter instead of a short array
alexklibisz opened this issue · 0 comments
Background
ArrayHitCounter uses an array of shorts to count hits. It's not a very memory-efficient implementation, as it requires an array entry for every document in the segment. So it uses shorts because a short requires half the memory of an int, and counts should rarely exceed the max value of a short.
I think an array of bytes would also work, and would require half the memory. This could be implemented as a new implementation of the HitCounter interface: rename the current one to ShortArrayHitCounter and add a new one ByteArrayHitCounter. The max value that fits in a byte is 256. So if the number of hashes passed to MatchHashesAndScoreQuery is <= 256, it uses the ByteArrayHitCounter, else it uses the ShortArrayHitCounter.
Bard already wrote most of it for me:
Deliverables
- Implement a ByteArrayHitCounter
- Benchmark it