Evaluate whether to replace or refactor text truncation function
Closed this issue · 3 comments
telpirion commented
The utils.truncate_complete_text
function uses a heuristic to extract the abstract and conclusion from a OCR result. This approach has multiple issues (cannot handle corner cases; doesn't capture all or only abstracts & conclusions).
I recommend replacing or refactoring this module in one of the following ways:
- Replace the heuristic string manipulation code with a call to an in-memory NLP or LLM model.
- Investigate improvements to Document AI templating to get better results from OCR.
- Use regex (ugh) to better isolate the abstract and conclusion
- Other?
github-actions commented
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days
rajveer43 commented
Would like to work this on!
github-actions commented
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days