Accessibility-tools is aimed at assisting deaf and hard-of-hearing individuals in learning to read lips, in an isolated setting or in conjunction with other assistive modalities (haptic device, cochlear implant, hearing aid).
git clone https://github.com/arikanev/accessibility-tools.git
brew install ffmpeg portaudio
pip3 install pyaudio
sudo apt-get install python-pyaudio python3-pyaudio
sudo apt install ffmpeg
pip3 install SpeechRecognition gdown pyspellchecker
cd accessibility-tools/
bash main.sh
bash record.sh
User is presented a prompt to speak, and is recorded speaking words of their choosing (preferably from a confusable minimal pairs word list).
User should wait a few seconds until the camera is activated for recording (green light for my mac OSX), then follow the prompt in the terminal.
When the prompt is finished press q
in terminal, the video capture stops.
Segmentation and captioning is then done automatically, and playback is through the play_some_segs.py
CLI tool, usage detailed below.
The below GIF shows the recording process, and the second playback method from example 3
python3 play_some_segs.py -i 1 26 13 -tr -t -v 1
-i
specifies:
segment indices 1, 26, 13
-tr
specifies:
-t
specifies:
-v
specifies:
python3 play_some_segs.py -tr -t -vi v1s2s4s35s42v4s1s3s38s29v7s0
-vi
specifies:
video 1, segments 2, 4, 35, 42
video 4, segments 1, 3, 38, 29
video 7, segment 0
The following command will run training through all video segments of all available videos:
python3 play_some_segs.py -tr
The following command will run training through all video segments of specified video (in this case, 3):
python3 play_some_segs.py -v 3 -tr
(-tr)
User is presented each selected video segment, with no "quiz" between videos to determine users ability and understanding what the speaker has just said.
This mode is purely for the user to get familiar with the corpus.
(-tr -trm)
User is presented each selected video segment twice in succession (First, without captions, second with captions), with no "quiz" between videos to determine users ability and understanding what the speaker has just said.
This mode is purely for the user to gauge their own skill/ability.
(-tr -trm -wc)
User is presented video segments in the same style as Modified Training, but the command line will optionally take user input on whether or not they correctly determined the word spoken in the current video segment.
A score is tabulated at the end.
This mode enables some extra pressure, a sort of warm-up for testing.
(-t)
User is presented each selected video segment as many times as specified in --numreps NUMREPS
(default is 1), without captions. After each segment a dialog box will ask the user to type in their best guess as to what word was just spoken.
Results are calculated at the end of testing.
We would like there to be additional modes for testing to reduce the scope/difficulty/number of potential answers, specifically a multiple choice mode and a confusable pair mode, to be more isolating/conducive to an experimental environment to test efficacy of alternative modalities for deaf and hard of hearing.
--range RANGE RANGE, -r RANGE RANGE
start and end integer range of video segments to select.
--train, -tr
enables training mode.
--trainm, -trm
enables modified training mode.
--test, -t
enables testing mode.
--numreps NUMREPS, -nr NUMREPS
number of times a video segments is repeated during training.
--shuffle, -s
shuffle selected segments randomly.
--vname VNAME, -v VNAME
name of video file to pull segments from.
--fname FNAME, -f FNAME
name of results file.
--idxs IDXS [IDXS ...], -i IDXS [IDXS ...]
indices of specific video segments to select. If used, overrides range argument.
--wcor, -wc
modified training mode will now include some testing features (warm up).
--vidxs VIDXS, -vi VIDXS
indices of specific video segments to select across all video files specified. If used, overrides idxs argument. (Key-generator).
Huge credits for a lot of the core brain of this project goes to ffmpeg
, mpv
, and google cloud speech api.