Evaluate whether to replace or refactor text truncation function

Question

Evaluate whether to replace or refactor text truncation function

Closed this issue 9 months ago · 3 comments

The utils.truncate_complete_text function uses a heuristic to extract the abstract and conclusion from a OCR result. This approach has multiple issues (cannot handle corner cases; doesn't capture all or only abstracts & conclusions).

I recommend replacing or refactoring this module in one of the following ways:

Replace the heuristic string manipulation code with a call to an in-memory NLP or LLM model.
Investigate improvements to Document AI templating to get better results from OCR.
Use regex (ugh) to better isolate the abstract and conclusion
Other?

Answer 1 · 2023-09-29T23:04:29.000Z

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days

Answer 2 · 2023-10-07T16:02:45.000Z

Would like to work this on!

Answer 3 · 2023-12-23T23:04:26.000Z

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days