why you validating a length of 30 characters
Closed this issue · 1 comments
estebance commented
Hi, just want to clarify what is the purpose of this block
for row in pdf:
if len(row['text']) < 30:
continue
filtered_pdf.append(row)
Why the criteria is 30 characters ?
I'd like to contribute to the project, but first I need to understand a little bit about the implementation
mukulpatnaik commented
Hi sorry for the late response, the 30 characters is to ignore subheadings and captions on images and other tiny pieces of text that may not be relevant