about the source file for textract

Question

about the source file for textract

Larbo53 opened this issue 2 years ago · 2 comments

Good morning,

here is the command I use to extract data from an image. I always use the same name 'test' ='img.png' for the variable 'Name' and response always returns the same result whatever the content of the 'test' file.

How can I get the content of the new 'test' file? Do I have to change the name of the source file every time?

Thanks for your feedback.

Thank you for your feedback.

response = textractmodule.detect_document_text(
Document={
'S3Object': {
'Bucket': s3BucketName,
'Name': test
}
})

Answer 1 · 2023-03-22T20:33:42.000Z

Is this a question about using Textract or the trp?

Answer 2 · 2023-04-17T12:55:30.000Z

sorry for my late reply.

Now I use the following code, without storing the document on the S3 service, as below, and it works.

Thank you.

"

im = Image.open(path+"image.png")
buffered = io.BytesIO()
im.save(buffered, format='PNG')
width , height = im.size
client = boto3.client('textract')
response = client.analyze_document(
Document={'Bytes': buffered.getvalue()},
FeatureTypes=['TABLES']
)