mukulpatnaik/researchgpt

why you validating a length of 30 characters

Closed this issue · 1 comments

Hi, just want to clarify what is the purpose of this block

    for row in pdf:
        if len(row['text']) < 30:
            continue
        filtered_pdf.append(row)

Why the criteria is 30 characters ?

I'd like to contribute to the project, but first I need to understand a little bit about the implementation

Hi sorry for the late response, the 30 characters is to ignore subheadings and captions on images and other tiny pieces of text that may not be relevant