QuivrHQ/quivr

Enable use of different chunking strategies

jacopo-chevallard opened this issue · 2 comments

Currently, we adopt a single chunking strategy for all documents. We should allow the simple configuration and use of different chunking strategies, including:

chunking_strategy:
default: "regex"
document_types:
- type: "technical_report"
strategy: "contextual"
- type: "customer_feedback"
strategy: "late"
regex_patterns:

  • pattern: "\n{2,}" # Split on double newlines
  • pattern: "(?:.|?|!)\s+" # Split on sentence-ending punctuation

"An example"