/cape-splitter

Split documents into chunks

Primary LanguagePythonApache License 2.0Apache-2.0

cape-splitter CircleCI

Functionality

Cape splitter provides the following functionality:

  • Split documents into groups, keeping full sentences and extracting overlapping text before and after the group.
  • Return batches, grouping batches by number of words.

Performance

Tokenization and splitting is done in 3.7 secs for SQuAD on a MacBook Pro (mid-2015 with 2.2 GHz Intel Core i7).