how to extract figures in pdf ?
Myfootnotsmelly opened this issue · 2 comments
Myfootnotsmelly commented
After setup, I tried
1.
doc.figures
2.
json.dump
but the results showed only figure box's position and its metadata, how can i get figure in the pdf?
kyleclo commented
Hey @Myfootnotsmelly , sorry looks like a bug introduced; adding in this pull request: #73
kyleclo commented
Hihi please take a look at my response to this Issue #70
Yes, figures are represented by bounding boxes:
If you want the image crop of the figures, here's how you'd do it:
# get the image of a page and its dimensions
page_image = doc.images[page_id]
page_w, page_h = page_image.pilimage.size
# get the bounding box of a figure
figure_box = figures[0].boxes[0]
# convert it
figure_box_xy = figure_box.to_absolute(page_width=page_w, page_height=page_h).xy_coordinates
# crop the image using PIL
page_image._pilimage.crop(figure_box_xy)