Word Marker isn't always in sync
andrinmeier opened this issue · 2 comments
Sometimes the word marking is not in sync with the audio output. See #2 (comment) for more details.
Instead of using a regex to figure out word boundaries and such, couldn't we just match the speech mark values with the text on the page? E.g. if the first speech mark contains "20.09.2020" we'd just match it with the first "20.09.2020" found on the page? It wouldn't work for times, because there's no speech mark for "Uhr" in "17.10 Uhr" but at least it would mark "17.10"? Or is that too simplistic.
While developing the current experimental version of gatsby-mdx-tts I was not really happy that there currently are two different approaches to isolate words: one for the generation of the mp3 files and the speech marks during built time, and one for the word highlighting during runtime. So I think if we could find a solution to reuse code here would not only be beneficial in terms of developer experience but also ensure that there is no misalignment between them --> word marker should be always in sync.
I think by not parsing the MDX AST directly but transform it to JSX first (as suggested in #14) we would already get a step closer to this!
What I could also think of would be to generate an ID for every word extracted from the MDX AST (or generated JSX). This ID is then somehow attached to every JSX component during build time. Also, it is stored together with the speech marks in the generated JSON file. Like that, we could just look up the ID of the currently played word in the React DOM and highlight the related component. This would ensure the highlighting is always in sync with the audio playback.
The idea is not yet thought-out and there might also be better approaches than manipulating the React DOM this much. But it might be some direction?