brailcom/speechd

module request: localai

Opened this issue · 3 comments

speechd is a 'common high-level interface to speech synthesis', providing a bridge between TTS engines and applications consuming TTS functionality (such as screen reader software). arguably tho, it could benefit from more natural-sounding TTS engines.

LocalAI is a common interface over various machine learning models, for tasks including TTS. it also helps install these models.
it would seem nice if speechd offered a module for LocalAI, so as to offer more natural-sounding voices, while speechd wouldn't need to worry about individual ML models, whereas LocalAI would not need to worry about individual screen reader applications.

/cc @mudler

Interesting! this is definitely piquing my interest. One thing that makes this probably a bit more convoluted here is that the LICENSE of speechd seems to be quite confusing for me here. It seems part of the code is under GPL, would that apply as well for something more high-level consuming speechd?

I see that the C api for instance is LGPL so it would be just fine, but is the C-API consuming GPL code behind the scene? that might just not fit for LocalAI as it is licensed under MIT

@mudler as per the linked issue we might have had a miscommunication. i'm not so much sure speechd is much of interest to localAI as a TTS engine, in the sense, i believe localAI's present TTS engines sound more natural already.
the synergy i see is the other way around: speechd consuming localAI, which i believe is less likely to involve licensing issues.

the use-case for this configuration i see is one could use a screen reader application, e.g. Gnome's Orca, and through speechd have it consume TTS models in LocalAI. the point here would be to make quality speech synthesis readily available for daily use by casual-ish linux users.

@mudler as per the linked issue we might have had a miscommunication. i'm not so much sure speechd is much of interest to localAI as a TTS engine, in the sense, i believe localAI's present TTS engines sound more natural already. the synergy i see is the other way around: speechd consuming localAI, which i believe is less likely to involve licensing issues.

the use-case for this configuration i see is one could use a screen reader application, e.g. Gnome's Orca, and through speechd have it consume TTS models in LocalAI. the point here would be to make quality speech synthesis readily available for daily use by casual-ish linux users.

I see, makes sense - I'll be around in case this is being picked up and in need of any help!