Unstructured-IO/unstructured-api

Feature request: Add structured response for parsing errors

flash1293 opened this issue · 0 comments

Currently, errors that are caused by the file to partition and errors in calling the API are handled the same way.

For example:

  • If a pdf file is corrupted, the error is 400 {'detail': 'File does not appear to be a valid PDF'}
  • If a docx file is corrupted, the error is 400 {'detail': 'File is not a valid docx'}
  • If the file param is set to file instead of files, the error is 400 {'detail': 'Request parameter "files" is required.\n'}

The first two errors are caused by bad data in the pipeline, while the third one is a bug in the code calling the API. It would be great if there would be a way to programmatically differentiate these situations to react appropriately. It seems like a status code 422 for the case of corrupted files would make sense.