[FEA]: Support arbitrary python functions to determine document split points
Opened this issue · 0 comments
randerzander commented
Is this a new feature, an improvement, or a change to existing functionality?
New Feature
How would you describe the priority of this feature request
Currently preventing usage
Please provide a clear description of problem this feature solves
In an ideal world we'd have cleanly extracted document section header metadata including location.
Then we could use the location of such document demarcations to support splitting on those demarcations.
However, some separators are arbitrary text content not likely to ever be identified by an ootb model. As a result, users would like to be able to run an arbitrary python function which can return split locations.
Describe the feature, and optionally a solution or implementation and any alternatives
See above
Additional context
No response