igniterealtime/openfire-ofmeet-plugin

Enable voice to text transcription with web speech api

deleolajide opened this issue · 13 comments

In Pade, I have used the web speech api to provide a voice-to-text transcription feature to a meeting. The transcribed text is injected into the group chat and also displayed on the screen as a caption/sub-title.

It would be nice to support this from the ofmeet web client as well as other Pade features. In the meantime, I will like to add 2 new ofmeet properties to the config UI settings

  • ofmeet.enable.voicetotext.transcription
  • ofmeet.enable.message.captions

image

Hi dele,

Great feature ! Works nicely with a PC/mac with chrome but on android, it loops and beep when it is enabled.
"To reproduce enable voice to text transcription restart and try out ofmeet"

Thank you for reminding me. I did have the beeping experience and I did not know what it was and had to disable all features one by one until I found it. I suspect it is Jitsi-Meet beeping when a new message arrives. I think I will disable it when using transcription

Cool :)

Thanks for checking this. It works like a charm GREAT GREAT addon Feature for Jitsi !!!
Congrats Code Master :)

The feature is interesting, but only if processed within OF server or locally on the client (browser). It seems the transcription in Chrome is done by sending your voice to a central unspecified server, probably a Google one... then they will have your voice on record, to be compared with other samples taken from different sources, with things you said and thought were private and so forth. Could anyone please confirm if that's the case or not?
ref: https://wiki.mozilla.org/Web_Speech_API_-_Speech_Recognition

Web speech like web push is a web standard implemented through a service provided by the browser vendor. Yes, if you browser is Chrome, then it is going to Google data centers. Not sure what happens with other Chromium based browsers and Firefox.

According to the article I linked, Firefox uses a Google server...
Dele, the feature is very interesting, but I'd say that's a severe privacy breach... I suggest either to drop it or to mark clearly your voice pattern and contents will be sent to Google. Imagine you are talking about a sensitive subject or something that involves intelectual property or some financial strategy... we know Google is in bed with governments and some other "central" corporations and it sells your data left and right. Just expressing an opinion for the common good.
If there's some other alternative that happens withing OF for sure, I don't see a problem.
That's why I created an initiative to improve and/or make clearly known the OF, OFMeet privacy level, as one of the main points of the projects is to be an alternative to centralized Skype, Zoom... please participate on that one.

another possibility is to have an option in OF for Meetings and other modules to disable the feature, with clear infos about what it is and what it entails...

another possibility is to have an option in OF for Meetings and other modules to disable the feature, with clear infos about what it is and what it entails...

Disable server-side for all

image

Disable client-side by user

image

Hi Dele, that's good, please consider marking the privacy concerns clearly as per my last 2 posts in that option. I still think it's too much of a risk for a person, family, a company or a government to send the contents of a conversation with voice patterns to Google. But choice is good if well informed.

please consider marking the privacy concerns clearly as per my last 2 posts in that option.

This is where you submit a pull request :-) You can do it easily from GitHub web pages directly

You're suggesting I create the markings myself... ok I'll try it. What would be the best place to have conceptual discussions (I can see the smile...) so it's possible to have a discussion about concepts, directions, concerns, etc before getting to the level of implementation? Wouldn't it be strange to have and option and a 2 page long warning of someone trying to convince you not to use it it? ;) I'm kind of joking but you get the gist of it.

There are just a few of us working on this. Any little helps just helps. Security and privacy are not my strengths.

This issue discussion thread (like this) is a good starting place to discuss the concepts, strategy and design. A pull request (PR) and its own discussion thread is where the implementation cycle delivers an accepted solution that is finally pulled and merged into the project.

A PR may not always be accepted, but is definitely welcome.

@deleolajide

Let me give you my input / feedback regarding this feature
Congrats for implementing this in such a way, i have deployed it on https://swisschat.free-solutions.org and testing.
For security remarks i fully agree with risks remarks in this thread but your implementation is correct as we can disable the feature at the console level or user level. That's exactly what is needed. Of course if you use the feature your voice is being sent to google but this is an extreme advantage to ease the usage of the system. For sure for privacy/security reasons this should not be enabled on sensitive servers.
I am fully satisfied with implementation you did as i can cover private secured systems and public systems that needs convenience and ease of use so Google Voice to go to room is a must have.
Same remarks for voice to text transcription, that's fine, we have same logic and options, it could be disable. I am also happy that this old free-sol feature could help at some point and be available to the whole OF community.

Congrats !

++