ContextLab/quail

ENH: add onset times to speech decoding function

andrewheusser opened this issue · 4 comments

The most recent version of the google API gives onset times of each word. add this to the speech decoding function

Google added a boolean enable_word_time_offsets flag to the API, so that's what I was going to use for quail as well to keep it simple (although they don't have to be the same if we want to pick a different argument name). By default, this would be set to True since I think most people using this package will want that info.

to use it:

results = quail.decode_speech('file.wav')

where results is a list of tuples (word, onset, offset). if save=True, a parsed text file and raw response object will be saved out. The format i was thinking for the parsed file would be:

WORD1, ONSET1, OFFSET1
WORD2, ONSET2, OFFSET2
...

Does this sound good to you? Anything that would make it easier that i might be missing?

@jeremymanning @KirstensGitHub @paxtonfitzpatrick @campbellfield

this looks great to me! I was just asking @jeremymanning why nobody uses offset times in this kind of research.. cool that google API provides that info

This looks good to me, too. A few questions/comments:

  • Does the flag control all timing information, or just offsets? We may want to allow user control over onsets and offsets independently (with both set to True by default). For full control we could include the following 3 flags (all default to true):
    • return_text: return the transcribed words
    • return_onsets: return the onsets of the words, in ms relative to the start of the .wav file
    • return_offsets: return the offsets of the words, in ms relative to the start of the .wav file
  • If save=True, the text, onsets, and offsets should all be written out to the file (but only the selected info should be returned by the function).
  • If the to-be-saved file already exists, don't run it through the speech-to-text engine again-- instead, parse the saved file and return the appropriate info
  • We'll need a parser to convert the outputted "autoparser" files (maybe .aann for auto-annotated, to match the .ann files from Penn TotalRecall?) to eggs
  • We'll also need to add an onset and offset field to the recalls (and possibly the presentation times-- e.g. then we could more easily support linking experimental events with other timing-specific things like brain signals)
  • We also need a parser to convert manually annotated files (.ann files) to eggs. The won't include the offset times
  • It might be neat to have the lag-CRL analysis (optionally) take offset times into account as a more precise way of estimating inter-response intervals. E.g. it'd be neat if accounting for offset times cleaned up the analysis.

Looping back to this issue— I think this is where we most recently discussed incorporating onset and offset times into the eggs. If we didn’t analyze the onset times this way, how did we do it?

Also noting for @andrewheusser: for the naturalistic extensions you’ve been working on, don’t we also need timing information?