festvox/flite

Some questions about resource usage

Kasslim opened this issue · 0 comments

I'm trying to find some more info about the library, I hope this is the right place to ask. I'm still very much a beginner when it comes to flite, so if anyone happens to know about any of this it would be incredibly helpful.

I'm attempting to get this library running on a resource-constrained platform, more specifically a 32-bit microcontroller with ~500 kB available RAM, 512kB ROM reserved for TTS, and plenty of flash storage. The plan is to output the resulting speech audio over i2s in real time.

Questions

About the following statement in the readme: "For standard diphone voices, maximum run time memory requirements are approximately less than twice the memory requirement for the waveform generated."

  • Does this mean splitting text into scentences, or even words, can reduce the RAM requirement because the "waveform" will be shorter?
  • If so, would feeding individual words impact speech quality with the default US english lexicon?
  • Is this the same "runtime" spec listed at <1M in the readme's memory comparison table? (Or are there other metrics that heavily affect RAM usage?)

About the other memory requirements; as I understand it: core (60k) + USEnglish (100k) + lexicon (600k) + diphone (1800k) can all potentially be stored in ROM instead of RAM

  • Is this correct?
  • Is there any hope of moving at least the diphone and lexicon to NAND flash instead of RAM/ROM?
  • If so, how do I approach this?

Any pointers in the right direction are welcome! Including possible approaches as to how I might find some answers myself.