langcog/childesr

make a function to get utterances surrounding a token ("context")

Closed this issue · 4 comments

this feature was originally requested by MH, and Dan proposed a manual solution. i'm just adding it here in case we want to build this functionality into the API.

his note, "Suppose I have a list of tokens or utterances, and I want to read the preceding context (like the last N utterances).. is there a way to do this in childesr?"

Dan's solution:

Shem_utts <- get_utterances(child = "Shem", age = 30)

dog_utts_inds <- filter(Shem_utts, str_detect(gloss, "dog")) %>%
  pull(order)

pre_dog_utts <- map(dog_utts_inds, 
    function(index) filter(Shem_utts, order > (index - 5) & order < index )) %>%
  bind_rows()

An idea for a bonus feature: have an option to merge contexts when the token appears again in the context. I suspect this will be a general interest feature because parents often repeat back to their kids what they said. For instance, looking for "big"... you get a lot of back and forth usages of "big" (is that a big one? its a big one.) The naive manual solution will extract a bunch of repeat contexts, which may not be what you want.

consensus:

  • this is useful, dan's solution above is good.
  • could use the IDs on the backend to be efficient

@amsan7 could you please add an index on the utterance_order column of the utterance table for this is to be efficient?

done!