Pre-embedded corpuses error - empty json
ddofer commented
Following the update that has the code download pre-embedded corpuses (great change that! ); I get an error when trying to run the README example
medrag = MedRAG(llm_name=LL_NAME, rag=True,
corpus_name="Textbooks", corpus_cache=True)
Output (error):
No sentence-transformers model found with name ncbi/MedCPT-Query-Encoder. Creating a new one with CLS pooling.
Initializing the document extracter...
0%| | 0/18 [00:00<?, ?it/s]
The error is one resulting from an empty (json) file.
Setting HNSW=True or false, or using RRF-2 doesn't change things.
Environment: WSL2. Medcorp already downloaded (but used/cached only with BM25).
ddofer commented
The issue may be specific to "TextBooks". (RAG run ok with StatPearls; fails with corpus_name= "MedText" or "TextBooks").
Teddy-XiongGZ commented
It looks like the issue raised by the absence of git-lfs
. Is git-lfs
installed on your machine when downloading the chunks?