mmz-001/doc-qa-tutorial

Often got errors when loading a PDF or asking questions

Opened this issue · 3 comments

Hi Sasmitha,

First good work! I am very impressed! I came up the similar idea and then I found your code. Later on I found chatpdf.com. They are taking the idea to a fine product. I have tried both your code and chatpdf. Their code is quite stable. They also have some new features, for example, after loading the PDF, it has GPT to come up three questions.

The problem that I had with your code is that it often comes up some errors: sometime at the time to load the PDF, sometime when asking questions. I tried the same file in chatpdf, they have no problem. Here is an example: I have attached the PDF file and you will see the following error message when you load it. I hope you can figure out what is the problem and fix it.

Thanks!

Leo


Share

ChatDoc - The AI Bot Answering Your Questions based on a Document
Upload a PDF file, then you can ask questions, our ChatGPT will answer questions based on the document

Drag and drop file here
Limit 200MB per file • PDF
Browse files
Chris_Mack_PhD_Thesis.pdf
0.7MB

openai.error.RateLimitError: This app has encountered an error. The original error message is redacted to prevent data leaks. Full error details have been recorded in the logs (if you're on Streamlit Cloud, click on 'Manage app' in the lower right of your app).
Traceback:
File "/home/appuser/venv/lib/python3.9/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 565, in _run_script
exec(code, module.dict)
File "/app/chatdoc/app.py", line 18, in
index = embed_text(parse_pdf(uploaded_file))
File "/home/appuser/venv/lib/python3.9/site-packages/streamlit/runtime/legacy_caching/caching.py", line 627, in wrapped_func
return get_or_create_cached_value()
File "/home/appuser/venv/lib/python3.9/site-packages/streamlit/runtime/legacy_caching/caching.py", line 611, in get_or_create_cached_value
return_value = non_optional_func(*args, **kwargs)
File "/app/chatdoc/utils.py", line 32, in embed_text
index = FAISS.from_texts(texts, embeddings)
File "/home/appuser/venv/lib/python3.9/site-packages/langchain/vectorstores/faiss.py", line 193, in from_texts
embeddings = embedding.embed_documents(texts)
File "/home/appuser/venv/lib/python3.9/site-packages/langchain/embeddings/openai.py", line 87, in embed_documents
responses = [
File "/home/appuser/venv/lib/python3.9/site-packages/langchain/embeddings/openai.py", line 88, in
self._embedding_func(text, engine=self.document_model_name)
File "/home/appuser/venv/lib/python3.9/site-packages/langchain/embeddings/openai.py", line 76, in _embedding_func
return self.client.create(input=[text], engine=engine)["data"][0]["embedding"]
File "/home/appuser/venv/lib/python3.9/site-packages/openai/api_resources/embedding.py", line 33, in create
response = super().create(*args, **kwargs)
File "/home/appuser/venv/lib/python3.9/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 153, in create
response, _, api_key = requestor.request(
File "/home/appuser/venv/lib/python3.9/site-packages/openai/api_requestor.py", line 226, in request
resp, got_stream = self._interpret_response(result, stream)
File "/home/appuser/venv/lib/python3.9/site-packages/openai/api_requestor.py", line 619, in _interpret_response
self._interpret_response_line(
File "/home/appuser/venv/lib/python3.9/site-packages/openai/api_requestor.py", line 679, in _interpret_response_line
raise self.handle_error_response(
Chris_Mack_PhD_Thesis.pdf

Hey, thanks for reporting this issue.
I'm probably guessing that the rate limit on the free OpenAI API key is causing the problem.
You can either use a paid API key or implement some sort of retrying mechanism to fix this.
Take a look at the source code for KnowledgeGPT (This is a more advanced version of doc-qa) to see how you can implement it.