nlmatics/llmsherpa

when trying to load multiple documents with joblib, get error cannot pickle

player1024 opened this issue · 2 comments

I am trying to parallelize ingestion of multiple, locally-stored PDFs, in my vectorstore.

when trying to load multiple documents with joblib, get error cannot pickle

PicklingError: Could not pickle the task to send it to the workers.

is this because of the API call involving accessing an external server for every PDF I am loading with llmsherpa?
What would be a workaround for this? Making this async (if yes, how)?

I think this is important for production.

thank you

Hi @player1024 - Please share your code. Parallelizing this should be similar to parallelizing any IO task. I think it will be better to create a separate LayoutPDFReader instance for each thread rather than reuse the same one.

Closing the issue as it has been resolved.