langcog/childesr

consider providing training/test split info

Closed this issue · 0 comments

if we want to incentivize appropriate validation steps, we could consider adding a get_train_test_split method that takes:

  • type argument for "corpus", "child", or "token"
  • proportion for how much in test (e.g., default 10%)

and returns a random filter for training and test split that can be passed to various other get_ methods.

can't tell if this is a good idea, but it might make it easier to do safe exploration + cross-validation...