IndexError: Dimension out of range
mary-octavia opened this issue · 12 comments
I'm getting an IndexError when trying to obtain validation scores by running the dense_retriever.py script:
File "dense_retriever.py", line 545, in main
questions_tensor = retriever.generate_question_vectors(questions, query_token=qa_src.special_query_token)
File "dense_retriever.py", line 128, in generate_question_vectors
selector=self.selector,
File "dense_retriever.py", line 75, in generate_question_vectors
max_vector_len = max(q_t.size(1) for q_t in batch_tensors)
File "dense_retriever.py", line 75, in <genexpr>
max_vector_len = max(q_t.size(1) for q_t in batch_tensors)
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)
The command I use:
python dense_retriever.py model_file=/data-ssd/osulea/DPR/dpr/downloads/checkpoint/retriever/single/nq/bert-base-encoder.cp qa_dataset=nq_test ctx_datatsets=[dpr_wiki] encoded_ctx_files=[\"/data-ssd/osulea/DPR/downloads/data/retriever_results/nq/single/wikipedia_passages_*\"] out_file=retr_inf.out
I got the same error, I guess the error reason has something to do with this annotation , and I walked around by "git reset --hard adbf1d9", in which commit the code that results this error doesn't exsit
Use size(0) instead:
max_vector_len = max(q_t.size(0) for q_t in batch_tensors)
min_vector_len = min(q_t.size(0) for q_t in batch_tensors)
@StalVars I solved the same problem. Thanks.
Could you explain the reason it works?
git reset --hard adbf1d9
That does not work for me, but at least I get a new error:
Traceback (most recent call last):
File "dense_retriever.py", line 596, in main
top_results_and_scores = retriever.get_top_docs(questions_tensor.numpy(), cfg.n_docs)
File "dense_retriever.py", line 176, in get_top_docs
results = self.index.search_knn(query_vectors, top_docs)
File "/data-ssd/osulea/DPR/dpr/indexer/faiss_indexers.py", line 110, in search_knn
db_ids = [[self.index_id_to_db_id[i] for i in query_top_idxs] for query_top_idxs in indexes]
File "/data-ssd/osulea/DPR/dpr/indexer/faiss_indexers.py", line 110, in
db_ids = [[self.index_id_to_db_id[i] for i in query_top_idxs] for query_top_idxs in indexes]
File "/data-ssd/osulea/DPR/dpr/indexer/faiss_indexers.py", line 110, in
db_ids = [[self.index_id_to_db_id[i] for i in query_top_idxs] for query_top_idxs in indexes]
IndexError: list index out of range
@mary-octavia I met the same error. It turns out I specified the wrong index files so the index was not loaded correctly.
I met the same error.
Has this bug been fixed ?
(or how to fix it.)
Hi, I'm wondering if you solve this problem? Thank you:)
I tried Use size(0) but it didn't work for me
@mary-octavia I met the same error. It turns out I specified the wrong index files so the index was not loaded correctly.
@PlusRoss Hi there, could you clarify a bit what you did to solve this? Thank you!:)
Hi, I found that "max_vector_len" and "min_vector_len" are temporary variables that adopted to filter out tensor size mismatch error.
As its support, recent code appended the following comments:
" # TODO: this only works for Wav2vec pipeline but will crash the regular text pipeline "
So.. I tried to remove the following codes, then.. it works.
# TODO: this only works for Wav2vec pipeline but will crash the regular text pipeline
# max_vector_len = max(q_t.size(1) for q_t in batch_tensors)
# min_vector_len = min(q_t.size(1) for q_t in batch_tensors)
#
# if max_vector_len != min_vector_len:
# # TODO: _pad_to_len move to utils
# from dpr.models.reader import _pad_to_len
# batch_tensors = [_pad_to_len(q.squeeze(0), 0, max_vector_len) for q in batch_tensors]
@mary-octavia Hi, I got the same error. Have you found a solution? I tried all the above solutions but nothing worked for me.
Hi, I found that "max_vector_len" and "min_vector_len" are temporary variables that adopted to filter out tensor size mismatch error.
As its support, recent code appended the following comments: " # TODO: this only works for Wav2vec pipeline but will crash the regular text pipeline "
So.. I tried to remove the following codes, then.. it works.
# TODO: this only works for Wav2vec pipeline but will crash the regular text pipeline # max_vector_len = max(q_t.size(1) for q_t in batch_tensors) # min_vector_len = min(q_t.size(1) for q_t in batch_tensors) # # if max_vector_len != min_vector_len: # # TODO: _pad_to_len move to utils # from dpr.models.reader import _pad_to_len # batch_tensors = [_pad_to_len(q.squeeze(0), 0, max_vector_len) for q in batch_tensors]
it works!!!!
git reset --hard adbf1d9
That does not work for me, but at least I get a new error:
Traceback (most recent call last): File "dense_retriever.py", line 596, in main top_results_and_scores = retriever.get_top_docs(questions_tensor.numpy(), cfg.n_docs) File "dense_retriever.py", line 176, in get_top_docs results = self.index.search_knn(query_vectors, top_docs) File "/data-ssd/osulea/DPR/dpr/indexer/faiss_indexers.py", line 110, in search_knn db_ids = [[self.index_id_to_db_id[i] for i in query_top_idxs] for query_top_idxs in indexes] File "/data-ssd/osulea/DPR/dpr/indexer/faiss_indexers.py", line 110, in db_ids = [[self.index_id_to_db_id[i] for i in query_top_idxs] for query_top_idxs in indexes] File "/data-ssd/osulea/DPR/dpr/indexer/faiss_indexers.py", line 110, in db_ids = [[self.index_id_to_db_id[i] for i in query_top_idxs] for query_top_idxs in indexes] IndexError: list index out of range
Has this bug been fixed ?