aws-samples/amazon-textract-textractor

Analyze documents with Amazon Textract and generate output in multiple formats.

Jupyter NotebookApache-2.0

Issues

Bounding box is incorrect for text converted from Markdown.
#410 opened 2 months ago by dharnieshraja
0
Use module name for logger instead of Root Logger
#367 opened 8 months ago by michaelshum321
6
`get_text_from_layout_json` throws `'NoneType' object is not subscriptable` for a specific PDF
#411 opened a month ago by neil-sola
1
Textractor doesn't detect the INVOICE_RECEIPT_ID, but the AWS Textract Demo can
#408 opened 2 months ago by arsher-b
1
Is search_words() broken?
#371 opened 7 months ago by ttruong-gilead
2
error: Textractor.detect_document_text() got an unexpected keyword argument 's3_output_path'
#409 opened 2 months ago by elbbub
0
issue with ordering in extractions, markdown and gettext methods
#388 opened 5 months ago by red-sky17
8
KeyError: 'Text' - on documents with tables
#343 opened 2 months ago by dzmitry-kankalovich
2
KeyError: 'Relationships'
#406 opened 2 months ago by lucio-xelda
6
The invoice number won’t be detected if there is no space between the label and the value
#397 opened 2 months ago by arsher-b
1
lambda layers builds are broker
#399 opened 3 months ago by gauravthadani
1
analyze_expense error: 'NoneType' object has no attribute 'spatial_object'
#401 opened 3 months ago by arsher-b
1
[textractprettyprinter] List contents are duplicated when generating text output using `get_text_from_layout_json`
#391 opened 5 months ago by adityachandak287
4
Trouble replicating markdown output
#384 opened 6 months ago by bvbg1
8
Incorrect order of text layouts due to compare_bounding_box() used in group_elements_horizontally()
#389 opened 5 months ago by keitaf
3
Support for `NotificationChannel` in Textract Caller's Async Methods
#390 opened 5 months ago by azucker99
0
Incorrect table cell word and line order
#369 opened 8 months ago by wessens
3
issue regarding .to_markdown() method
#380 opened 5 months ago by red-sky17
4
Detected in EXPENSE_ROW but not as ITEM
#385 opened 5 months ago by arsher-b
1
InvalidParameterException: Request has invalid parameters when using startDocumentAnalysis
#383 opened 6 months ago by arunsingh28
0
prefix and suffix for footer layout is not available
#365 opened 6 months ago by LeoHemamou
1
Exception handling is hiding the underlying issue of the error.
#364 opened 6 months ago by vdefeo-caylent
3
pdf2image is required even though save_image=False
#366 opened 6 months ago by vdefeo-caylent
1
Lambda layers for Python 3.12 PDF raising an exception on missing libpng16.so.16
#373 opened 6 months ago by Viajante80
3
Save image doesn't work with S3 path - TypeError: Invalid input type 'bytearray'
#382 opened 7 months ago by steffeng
3
Empty expense_documents on analyze_expense
#370 opened 7 months ago by arsher-b
3
Lambda layers for Python 3.12 raising an exception on missing libopenjp2.so.7
#372 opened 7 months ago by Belval
0
'NoneType' object has no attribute 'spatial_object' on Expense Analysis results
#368 opened 8 months ago by HarryTSaban
0
feature request: add query alias parameter
#361 opened 8 months ago by parad0x96
2
cell content extraction error
#355 opened 9 months ago by Larbo53
2
issue with extraction, get_text_fromlayout_json function
#356 opened 9 months ago by red-sky17
1
Table cell, incorrectly, does not pick up the cell text/words. Page--> Line picks up the words as in the textract output
#358 opened 9 months ago by raidken
1
Access Non-Axis-Aligned Bounding Boxes
#359 opened 9 months ago by zkalson
2
Cryptic CLI error in SageMaker Studio (and probably other role-based environments?)
#352 opened 9 months ago by athewsey
1
[Feature Request] Simplified batch processing CLI
#353 opened 9 months ago by athewsey
1
Python Support for Column Headers
#351 opened 9 months ago by Belval
0
Exporting text+tables while maintaining layout
#347 opened 9 months ago by austinmw
1
KeyError in get_lines_string
#348 opened 9 months ago by sbui-dev
0
S3 path parsing for textractcaller is not robust enough
#345 opened 10 months ago by anjanvb
0
JPEG conversion in `analyze_document` significantly impacts table predictions
#341 opened 10 months ago by Belval
1
Proper way of getting cell content?
#336 opened 10 months ago by ttruong-gilead
5
Textractor import error
#338 opened 10 months ago by umaaaaaaaaa
1
[Q] hide_keyavlue_layout option in TextLinearizationConfig
#323 opened 10 months ago by eml-39502
4
Missing CITATION.cff file for repo
#331 opened 10 months ago by mhucka
1
Large PDF response processing is slow
#337 opened 10 months ago by Belval
0
Parsing response from a start_document_analysis()
#335 opened 10 months ago by ttruong-gilead
2
Queries ordering is not preserved after parsing
#328 opened 10 months ago by Belval
1
Error in get_layout_text_from_json in textractprettyprinter
#333 opened 10 months ago by gwynethguo
0
Query entity is not linearizable
#327 opened 10 months ago by Belval
0
Caller: allow early return when job incomplete
#326 opened 10 months ago by symroe
1