Sortling/layout issues when Y coordinates don't exactly match
Opened this issue · 1 comments
Hi,
We've been using an old version of this gem (1.4.1) for a little while now and we are looking to upgrade to the latest version. That upgrade broke some of our specs and when looking deeper, it seems like the logic around PageLayout
changed.
It might also be bad luck, but the use of the round
here (for X and Y coords) will create issues when the PDF generated the texts with slightly different y
coordinates.
Below is an example:
In this case, the texts in those boxes/rectangles are slightly lower than the labels from that form, causing some of those texts to be generated on another line:
Claim Number: PHNP1610102 Contact:
Insured: Phone:
Fairfield Boys Club
Address 1: Email:
c/o Bejo Nanni, Treasurer
We could monkey patch or fork the repo to make those changes, but please see below the code that we're going to be using. I can create a PR if this repo is still well maintained. Please let me know.
PageLayout
class PDF::Reader
class PageLayout
def to_s
return "" if @runs.empty?
return "" if row_count == 0
first_run_at_new_y = nil # remembering a previous run at a new Y coordinate
page = row_count.times.map { |i| " " * col_count }
@runs.each do |run|
x_pos = ((run.x - @x_offset) / col_multiplier).round
y_ref_run = run # line added
if first_run_at_new_y && run.similar_y_coord?(first_run_at_new_y) # line added
y_ref_run = first_run_at_new_y # line added
else # line added
first_run_at_new_y = run # line added
end # line added
y_pos = row_count - ((y_ref_run.y - @y_offset) / row_multiplier).round # line updated
if y_pos <= row_count && y_pos >= 0 && x_pos <= col_count && x_pos >= 0
local_string_insert(page[y_pos-1], run.text, x_pos)
end
end
interesting_rows(page).map(&:rstrip).join("\n")
end
end
end
TextRun
class PDF::Reader
class TextRun
# def <=>(other)
# if similar_y_coord?(other)
# x <=> other.x
# else
# other.y <=> y
# end
# end
def similar_y_coord?(other, threshold = nil)
# arbitrary logic below. It could probably safely bumped to a higher number (dividing by 2 for instance)
threshold = threshold || [self.font_size, other.font_size].min / 3
(self.y - other.y).abs < threshold
end
end
end
Thank you.
EDIT: I updated the code above to properly support for catching multiple texts which could have been drawn on the following line.
Thanks for the well written report.
If you have the code in a fork for your own use, I'd love a PR that I can play with, check our spec suite, etc ❤️