herongrove/collab

Pre-processing text

Closed this issue · 0 comments

  • => '
  • => ", fi => "
  • remove obvious headers (page 5 of 6)
  • remove \r?\n