Enable use of different chunking strategies
jacopo-chevallard opened this issue · 2 comments
jacopo-chevallard commented
Currently, we adopt a single chunking strategy for all documents. We should allow the simple configuration and use of different chunking strategies, including:
- late chunking
- (potentially contextual chunking)
- regex chunking
vivek-official-tech commented
chunking_strategy:
default: "regex"
document_types:
- type: "technical_report"
strategy: "contextual"
- type: "customer_feedback"
strategy: "late"
regex_patterns:
- pattern: "\n{2,}" # Split on double newlines
- pattern: "(?:.|?|!)\s+" # Split on sentence-ending punctuation
"An example"