ERROR - Exception in ASGI application / Error in pixCreateHeader
Opened this issue · 8 comments
Describe the bug
2023-12-23 13:47:01,827 41.250.50.106:53059 POST /general/v0/general HTTP/1.1 - 500 Internal Server Error
2023-12-23 13:47:01,827 uvicorn.error ERROR Exception in ASGI application
Traceback (most recent call last):
File "/home/notebook-user/.local/lib/python3.10/site-packages/uvicorn/protocols/http/h11_impl.py", line 408, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/home/notebook-user/.local/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
return await self.app(scope, receive, send)
File "/home/notebook-user/.local/lib/python3.10/site-packages/fastapi/applications.py", line 1106, in __call__
await super().__call__(scope, receive, send)
File "/home/notebook-user/.local/lib/python3.10/site-packages/starlette/applications.py", line 122, in __call__
await self.middleware_stack(scope, receive, send)
File "/home/notebook-user/.local/lib/python3.10/site-packages/starlette/middleware/errors.py", line 184, in __call__
raise exc
File "/home/notebook-user/.local/lib/python3.10/site-packages/starlette/middleware/errors.py", line 162, in __call__
await self.app(scope, receive, _send)
File "/home/notebook-user/.local/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
raise exc
File "/home/notebook-user/.local/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
await self.app(scope, receive, sender)
File "/home/notebook-user/.local/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 20, in __call__
raise e
File "/home/notebook-user/.local/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in __call__
await self.app(scope, receive, send)
File "/home/notebook-user/.local/lib/python3.10/site-packages/starlette/routing.py", line 718, in __call__
await route.handle(scope, receive, send)
File "/home/notebook-user/.local/lib/python3.10/site-packages/starlette/routing.py", line 276, in handle
await self.app(scope, receive, send)
File "/home/notebook-user/.local/lib/python3.10/site-packages/starlette/routing.py", line 66, in app
response = await func(request)
File "/home/notebook-user/.local/lib/python3.10/site-packages/fastapi/routing.py", line 274, in app
raw_response = await run_endpoint_function(
File "/home/notebook-user/.local/lib/python3.10/site-packages/fastapi/routing.py", line 193, in run_endpoint_function
return await run_in_threadpool(dependant.call, **values)
File "/home/notebook-user/.local/lib/python3.10/site-packages/starlette/concurrency.py", line 41, in run_in_threadpool
return await anyio.to_thread.run_sync(func, *args)
File "/home/notebook-user/.local/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/notebook-user/.local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/home/notebook-user/.local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/home/notebook-user/prepline_general/api/general.py", line 811, in pipeline_1
list(response_generator(is_multipart=False))[0]
File "/home/notebook-user/prepline_general/api/general.py", line 749, in response_generator
response = pipeline_api(
File "/home/notebook-user/prepline_general/api/general.py", line 434, in pipeline_api
elements = partition(**partition_kwargs)
File "/home/notebook-user/.local/lib/python3.10/site-packages/unstructured/partition/auto.py", line 384, in partition
elements = _partition_pdf(
File "/home/notebook-user/.local/lib/python3.10/site-packages/unstructured/documents/elements.py", line 503, in wrapper
elements = func(*args, **kwargs)
File "/home/notebook-user/.local/lib/python3.10/site-packages/unstructured/file_utils/filetype.py", line 591, in wrapper
elements = func(*args, **kwargs)
File "/home/notebook-user/.local/lib/python3.10/site-packages/unstructured/file_utils/filetype.py", line 546, in wrapper
elements = func(*args, **kwargs)
File "/home/notebook-user/.local/lib/python3.10/site-packages/unstructured/chunking/title.py", line 241, in wrapper
elements = func(*args, **kwargs)
File "/home/notebook-user/.local/lib/python3.10/site-packages/unstructured/partition/pdf.py", line 172, in partition_pdf
return partition_pdf_or_image(
File "/home/notebook-user/.local/lib/python3.10/site-packages/unstructured/partition/pdf.py", line 279, in partition_pdf_or_image
elements = _partition_pdf_or_image_local(
File "/home/notebook-user/.local/lib/python3.10/site-packages/unstructured/utils.py", line 214, in wrapper
return func(*args, **kwargs)
File "/home/notebook-user/.local/lib/python3.10/site-packages/unstructured/partition/pdf.py", line 409, in _partition_pdf_or_image_local
final_layout = process_data_with_ocr(
File "/home/notebook-user/.local/lib/python3.10/site-packages/unstructured/partition/ocr.py", line 82, in process_data_with_ocr
merged_layouts = process_file_with_ocr(
File "/home/notebook-user/.local/lib/python3.10/site-packages/unstructured/partition/ocr.py", line 168, in process_file_with_ocr
raise e
File "/home/notebook-user/.local/lib/python3.10/site-packages/unstructured/partition/ocr.py", line 157, in process_file_with_ocr
merged_page_layout = supplement_page_layout_with_ocr(
File "/home/notebook-user/.local/lib/python3.10/site-packages/unstructured/partition/ocr.py", line 190, in supplement_page_layout_with_ocr
ocr_layout = get_ocr_layout_from_image(
File "/home/notebook-user/.local/lib/python3.10/site-packages/unstructured/partition/ocr.py", line 430, in get_ocr_layout_from_image
ocr_regions = get_ocr_layout_tesseract(image, ocr_languages)
File "/home/notebook-user/.local/lib/python3.10/site-packages/unstructured/partition/ocr.py", line 465, in get_ocr_layout_tesseract
ocr_df = unstructured_pytesseract.image_to_data(
File "/home/notebook-user/.local/lib/python3.10/site-packages/unstructured_pytesseract/pytesseract.py", line 591, in image_to_data
return {
File "/home/notebook-user/.local/lib/python3.10/site-packages/unstructured_pytesseract/pytesseract.py", line 593, in <lambda>
Output.DATAFRAME: lambda: get_pandas_output(
File "/home/notebook-user/.local/lib/python3.10/site-packages/unstructured_pytesseract/pytesseract.py", line 568, in get_pandas_output
return pd.read_csv(BytesIO(run_and_get_output(*args)), **kwargs)
File "/home/notebook-user/.local/lib/python3.10/site-packages/unstructured_pytesseract/pytesseract.py", line 347, in run_and_get_output
run_tesseract(**kwargs)
File "/home/notebook-user/.local/lib/python3.10/site-packages/unstructured_pytesseract/pytesseract.py", line 279, in run_tesseract
raise TesseractError(proc.returncode, get_errors(error_string))
unstructured_pytesseract.pytesseract.TesseractError: (1, 'Error in pixCreateHeader: requested w = 34680, h = 48360, d = 32 Error in pixCreateHeader: requested bytes >= 2^31 Error in pixCreateNoInit: pixd not made Error in pixCreate: pixd not made Error in pixReadStreamPng: pix not made Error in pixReadStream: png: no pix returned Error in pixRead: pix not read Error during processing.')
To Reproduce
var myHeaders = new Headers();
var formdata = new FormData();
formdata.append("files", fileInput.files[0], "ΣΤ ΔΗΜΟΤΙΚΟΥ 3.pdf");
formdata.append("output_format", "application/json");
formdata.append("coordinates", "false");
formdata.append("encoding", "utf-8");
formdata.append("hi_res_model_name", "detectron2_onnx");
formdata.append("include_page_breaks", "false");
formdata.append("ocr_languages", "");
formdata.append("pdf_infer_table_structure", "true");
formdata.append("skip_infer_table_types", "jpg, png");
formdata.append("strategy", "hi_res");
formdata.append("xml_keep_tags", "true");
var requestOptions = {
method: 'POST',
headers: myHeaders,
body: formdata,
redirect: 'follow'
};
fetch("http://my_hosted_api:8000/general/v0/general", requestOptions)
.then(response => response.text())
.then(result => console.log(result))
.catch(error => console.log('error', error));
Filetype
- File: ΣΤ ΔΗΜΟΤΙΚΟΥ 3.pdf
Environment:
- Self-hosting
- Postman
Hi there, this error has hopefully been fixed in the library here. We're a bit behind on the unstructured version in the requirements here - can you try pip install unstructured==0.11.6
and see if this is resolved?
Hello @awalker4, I can't because I'm using the docker image, is there any other way? About the versions, is there a reason for the lagging behind? Thank you
No good reason other than our Dependabot seems to broken 😂 Hang tight, I'll bump the versions now to get a new image out.
Thank you @awalker4 😂
@awalker4 No new docker image?
Ah, seems this job just needs to finish: https://github.com/Unstructured-IO/unstructured-api/actions/runs/7310799243
@awalker4 I did another test with the new image and unfortunately, I got the same error.
Apologies, this bug slipped off the radar. Are you still seeing this issue?