AmenRa/retriv

HybridRetriever raise KeyError: -1 if the len of doc less than 1_000

tshu-w opened this issue · 1 comments

tshu-w commented

The cutoff of msearch for HybridRetriever is hardcode to 1_000, which makes map_internal_ids_to_original_ids raise KeyError when doc len less than 1_000

sparse_results = self.sparse_retriever.search(query, False, 1_000)
dense_results = self.dense_retriever.search(query, False, 1_000)

Thus, map_internal_ids_to_original_ids should be:

def map_internal_ids_to_original_ids(self, doc_ids: Iterable) -> List[str]:
    return [self.id_mapping[doc_id] for doc_id in doc_ids if doc_id != -1]
AmenRa commented

Thanks for reporting the bug!
I'll fix it soon.