HEAR is an iOS application that adds visual aid during an oral conversation for individuals with hearing impairments. It uses speech-to-text technology integrated with Augmented Reality and facial recognition to add subtitles underneath each speaker throughout a conversation.
Inspired by a recent conference talk on accessibility as well as family members that are hearing impaired, we wanted to create a hack that targeted pain points that individuals with hard of hearing deal with, every day.
The ARKit 2 was used to capture objects in a 3D scene and attach subtitle nodes to them allowing the subtitles to follow speakers. Subtitle text size is dictated based off distance which would not be possible without ARKit.
CoreML2 was mostly used for its computer vision application. HEAR uses facial recognition to detect potential speakers and to position subtitles in the right position. This is achieved efficiently by utilizing the Vision API.
Speech to text is the most important feature of HEAR and for that reason, the SiriKit was chosen to transcribe speech from speakers to subtitles. Having Siri perform some computations and natural language processing locally helps speedup the transcription which leads to a better user experience.
HEAR uses SpriteKit to overlay the subtitles in a 3D environment. SpriteKit also allows text customization to make the text clearer and more legible on varying backgrounds.
- Simultaneous speakers and subtitles
- More accurate speaker tracking
- Higher accuracy in noisy environments
- Syncing of conversations to cloud for later review
- Integration into augmented reality lenses
- Real time translation of subtitles
Benjamin Barault, Francesco Valela, Jacob Gagné, Tobi Décary-Larocque