microsoft/PubSec-Info-Assistant

Unable to process PDF files - Form Processor error - 400 Bad Request

Closed this issue · 5 comments

Bug Description:
{
"status": "FileFormRecSubmissionPDF - Error on PDF submission to FR - 400 - Bad Request",
"status_timestamp": "2024-05-08 21:45:56",
"status_classification": "Error",
"stack_trace": "Traceback (most recent call last):\n File "/usr/local/lib/python3.11/threading.py", line 1002, in _bootstrap\n self._bootstrap_inner()\n File "/usr/local/lib/python3.11/threading.py", line 1045, in _bootstrap_inner\n self.run()\n File "/usr/local/lib/python3.11/threading.py", line 982, in run\n self._target(*self._args, **self._kwargs)\n File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 83, in _worker\n work_item.run()\n File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 58, in run\n result = self.fn(*self.args, **self.kwargs)\n File "/azure-functions-host/workers/python/3.11/LINUX/X64/azure_functions_worker/dispatcher.py", line 826, in _run_sync_func\n return ExtensionManager.get_sync_invocation_wrapper(context,\n File "/azure-functions-host/workers/python/3.11/LINUX/X64/azure_functions_worker/extension.py", line 215, in _raw_invocation_wrapper\n result = function(**args)\n File "/home/site/wwwroot/FileFormRecSubmissionPDF/init.py", line 165, in main\n statusLog.upsert_document(\n File "/home/site/wwwroot/shared_code/status_log.py", line 168, in upsert_document\n new_item["stack_trace"] = self.get_stack_trace()\n"
}

Steps:

  1. Dropped PDF file
  2. It is failing in the second function: FileFormRecSubmissionPDF

CosmosDB Log:
{
"id": "dXBsb2FkL2VkYXYtcHVibGljLWRvY3MvMjEtMDQ1Ni5wZGY=",
"file_path": "upload/edav-public-docs/21-0456.pdf",
"file_name": "21-0456.pdf",
"state": "Error",
"start_timestamp": "2024-05-09 13:34:33",
"state_description": "",
"state_timestamp": "2024-05-09 13:38:28",
"status_updates": [
{
"status": "File uploaded from browser to Azure Blob Storage",
"status_timestamp": "2024-05-09 13:34:33",
"status_classification": "Info"
},
{
"status": "Pipeline triggered by Blob Upload",
"status_timestamp": "2024-05-09 13:34:42",
"status_classification": "Info"
},
{
"status": "FileUploadedFunc - FileUploadedFunc function started",
"status_timestamp": "2024-05-09 13:34:42",
"status_classification": "Debug"
},
{
"status": "FileUploadedFunc - pdf file sent to submit queue. Visible in 216 seconds",
"status_timestamp": "2024-05-09 13:34:42",
"status_classification": "Debug"
},
{
"status": "FileFormRecSubmissionPDF - Received message from pdf-submit-queue ",
"status_timestamp": "2024-05-09 13:38:27",
"status_classification": "Debug"
},
{
"status": "FileFormRecSubmissionPDF - Submitting to Form Recognizer",
"status_timestamp": "2024-05-09 13:38:27",
"status_classification": "Info"
},
{
"status": "FileFormRecSubmissionPDF - SAS token generated",
"status_timestamp": "2024-05-09 13:38:27",
"status_classification": "Debug"
},
{
"status": "FileFormRecSubmissionPDF - Error on PDF submission to FR - 400 - Bad Request",
"status_timestamp": "2024-05-09 13:38:28",
"status_classification": "Error",
"stack_trace": "Traceback (most recent call last):\n File "/usr/local/lib/python3.11/threading.py", line 1002, in _bootstrap\n self._bootstrap_inner()\n File "/usr/local/lib/python3.11/threading.py", line 1045, in _bootstrap_inner\n self.run()\n File "/usr/local/lib/python3.11/threading.py", line 982, in run\n self._target(*self._args, **self._kwargs)\n File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 83, in _worker\n work_item.run()\n File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 58, in run\n result = self.fn(*self.args, **self.kwargs)\n File "/azure-functions-host/workers/python/3.11/LINUX/X64/azure_functions_worker/dispatcher.py", line 933, in _run_sync_func\n return ExtensionManager.get_sync_invocation_wrapper(context,\n File "/azure-functions-host/workers/python/3.11/LINUX/X64/azure_functions_worker/extension.py", line 215, in _raw_invocation_wrapper\n result = function(**args)\n File "/home/site/wwwroot/FileFormRecSubmissionPDF/init.py", line 165, in main\n statusLog.upsert_document(\n File "/home/site/wwwroot/shared_code/status_log.py", line 168, in upsert_document\n new_item["stack_trace"] = self.get_stack_trace()\n"
}
],
"_rid": "IpRqAOcupRGoAAAAAAAAAA==",
"_self": "dbs/IpRqAA==/colls/IpRqAOcupRE=/docs/IpRqAOcupRGoAAAAAAAAAA==/",
"_etag": ""9102409a-0000-0100-0000-663cd1d40000"",
"_attachments": "attachments/",
"_ts": 1715261908
}

//This is the code block where it is failing. FileFormRecSubmissionPDF - Error on PDF submission to FR - 400 - Bad Request

Construct and submmit the message to FR

    headers = {
        "Content-Type": "application/json",
        "Ocp-Apim-Subscription-Key": FR_key,
    }

    params = {"api-version": api_version}

    body = {"urlSource": blob_path_plus_sas}
    url = f"{endpoint}formrecognizer/documentModels/{FR_MODEL}:analyze"

    logging.info(f"Submitting to FR with url: {url}")

    # Send the HTTP POST request with headers, query parameters, and request body
    response = requests.post(url, headers=headers, params=params, json=body)

It is blocking our test with PDF files. If anyone can help to resolve this issue will be greatly appreciated.

  • Does it work on other PDF's?
  • Are you able to share the pdf with us?
  • can you go to the Azure portal and then the functions app and select the function from the list on the overview page. From there you can select the run that failed and see the log from that run. Can you share that with us.
  • Perhaps try to debug and step through the function to understand the error using this method... https://github.com/microsoft/PubSec-Info-Assistant/blob/main/docs/function_debug.md
  • perhaps do a fresh make deploy in case there was an issue in deployment.
  • Have you tried to submit the pdf through the Document Intelligence Studio to see if it errors? This is possible via the Azure portal if you access the Azure AI services multi-service account and then hit Document Intelligence Studio.

This issue is marked for closure due to inactivity for 2 weeks. It will be closed in 5 days.

closing due to inactivity