sap218/ocimido

Text Splitting

Closed this issue · 0 comments

  • splitting via threads - all comments (not users)
  • each thread sorted into bins: short/medium/long
  • extract 20% from each bin and that is the test data set: put somewhere and not touch
  • training is 80% of each bin
  • extract synonyms for ontology