Unable to process PDF files - Form Processor error - 400 Bad Request
Closed this issue · 5 comments
Bug Description:
{
"status": "FileFormRecSubmissionPDF - Error on PDF submission to FR - 400 - Bad Request",
"status_timestamp": "2024-05-08 21:45:56",
"status_classification": "Error",
"stack_trace": "Traceback (most recent call last):\n File "/usr/local/lib/python3.11/threading.py", line 1002, in _bootstrap\n self._bootstrap_inner()\n File "/usr/local/lib/python3.11/threading.py", line 1045, in _bootstrap_inner\n self.run()\n File "/usr/local/lib/python3.11/threading.py", line 982, in run\n self._target(*self._args, **self._kwargs)\n File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 83, in _worker\n work_item.run()\n File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 58, in run\n result = self.fn(*self.args, **self.kwargs)\n File "/azure-functions-host/workers/python/3.11/LINUX/X64/azure_functions_worker/dispatcher.py", line 826, in _run_sync_func\n return ExtensionManager.get_sync_invocation_wrapper(context,\n File "/azure-functions-host/workers/python/3.11/LINUX/X64/azure_functions_worker/extension.py", line 215, in _raw_invocation_wrapper\n result = function(**args)\n File "/home/site/wwwroot/FileFormRecSubmissionPDF/init.py", line 165, in main\n statusLog.upsert_document(\n File "/home/site/wwwroot/shared_code/status_log.py", line 168, in upsert_document\n new_item["stack_trace"] = self.get_stack_trace()\n"
}
Steps:
- Dropped PDF file
- It is failing in the second function: FileFormRecSubmissionPDF
CosmosDB Log:
{
"id": "dXBsb2FkL2VkYXYtcHVibGljLWRvY3MvMjEtMDQ1Ni5wZGY=",
"file_path": "upload/edav-public-docs/21-0456.pdf",
"file_name": "21-0456.pdf",
"state": "Error",
"start_timestamp": "2024-05-09 13:34:33",
"state_description": "",
"state_timestamp": "2024-05-09 13:38:28",
"status_updates": [
{
"status": "File uploaded from browser to Azure Blob Storage",
"status_timestamp": "2024-05-09 13:34:33",
"status_classification": "Info"
},
{
"status": "Pipeline triggered by Blob Upload",
"status_timestamp": "2024-05-09 13:34:42",
"status_classification": "Info"
},
{
"status": "FileUploadedFunc - FileUploadedFunc function started",
"status_timestamp": "2024-05-09 13:34:42",
"status_classification": "Debug"
},
{
"status": "FileUploadedFunc - pdf file sent to submit queue. Visible in 216 seconds",
"status_timestamp": "2024-05-09 13:34:42",
"status_classification": "Debug"
},
{
"status": "FileFormRecSubmissionPDF - Received message from pdf-submit-queue ",
"status_timestamp": "2024-05-09 13:38:27",
"status_classification": "Debug"
},
{
"status": "FileFormRecSubmissionPDF - Submitting to Form Recognizer",
"status_timestamp": "2024-05-09 13:38:27",
"status_classification": "Info"
},
{
"status": "FileFormRecSubmissionPDF - SAS token generated",
"status_timestamp": "2024-05-09 13:38:27",
"status_classification": "Debug"
},
{
"status": "FileFormRecSubmissionPDF - Error on PDF submission to FR - 400 - Bad Request",
"status_timestamp": "2024-05-09 13:38:28",
"status_classification": "Error",
"stack_trace": "Traceback (most recent call last):\n File "/usr/local/lib/python3.11/threading.py", line 1002, in _bootstrap\n self._bootstrap_inner()\n File "/usr/local/lib/python3.11/threading.py", line 1045, in _bootstrap_inner\n self.run()\n File "/usr/local/lib/python3.11/threading.py", line 982, in run\n self._target(*self._args, **self._kwargs)\n File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 83, in _worker\n work_item.run()\n File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 58, in run\n result = self.fn(*self.args, **self.kwargs)\n File "/azure-functions-host/workers/python/3.11/LINUX/X64/azure_functions_worker/dispatcher.py", line 933, in _run_sync_func\n return ExtensionManager.get_sync_invocation_wrapper(context,\n File "/azure-functions-host/workers/python/3.11/LINUX/X64/azure_functions_worker/extension.py", line 215, in _raw_invocation_wrapper\n result = function(**args)\n File "/home/site/wwwroot/FileFormRecSubmissionPDF/init.py", line 165, in main\n statusLog.upsert_document(\n File "/home/site/wwwroot/shared_code/status_log.py", line 168, in upsert_document\n new_item["stack_trace"] = self.get_stack_trace()\n"
}
],
"_rid": "IpRqAOcupRGoAAAAAAAAAA==",
"_self": "dbs/IpRqAA==/colls/IpRqAOcupRE=/docs/IpRqAOcupRGoAAAAAAAAAA==/",
"_etag": ""9102409a-0000-0100-0000-663cd1d40000"",
"_attachments": "attachments/",
"_ts": 1715261908
}
//This is the code block where it is failing. FileFormRecSubmissionPDF - Error on PDF submission to FR - 400 - Bad Request
Construct and submmit the message to FR
headers = {
"Content-Type": "application/json",
"Ocp-Apim-Subscription-Key": FR_key,
}
params = {"api-version": api_version}
body = {"urlSource": blob_path_plus_sas}
url = f"{endpoint}formrecognizer/documentModels/{FR_MODEL}:analyze"
logging.info(f"Submitting to FR with url: {url}")
# Send the HTTP POST request with headers, query parameters, and request body
response = requests.post(url, headers=headers, params=params, json=body)
It is blocking our test with PDF files. If anyone can help to resolve this issue will be greatly appreciated.
- Does it work on other PDF's?
- Are you able to share the pdf with us?
- can you go to the Azure portal and then the functions app and select the function from the list on the overview page. From there you can select the run that failed and see the log from that run. Can you share that with us.
- Perhaps try to debug and step through the function to understand the error using this method... https://github.com/microsoft/PubSec-Info-Assistant/blob/main/docs/function_debug.md
- perhaps do a fresh make deploy in case there was an issue in deployment.
- Have you tried to submit the pdf through the Document Intelligence Studio to see if it errors? This is possible via the Azure portal if you access the Azure AI services multi-service account and then hit Document Intelligence Studio.
This issue is marked for closure due to inactivity for 2 weeks. It will be closed in 5 days.
closing due to inactivity