“If you talk to a man in a language he understands, that goes to his head. If you talk to him in his own language, that goes to his heart.” — Nelson Mandela
Language has the power to unify people across cultures. This year at HackMIT, our team, passionate about education and the power of technology to increase educational equity, decided to tackle language learning. We understand that interacting with media such as movies and videos brought a conversational and fun method to education, but we also knew the hopeless feelings felt when a student cannot keep up with the speech. Our hack is, therefore, a smart solution to help students learn at their own pace and comfort.
Our project tackles education, social good, and connectivity while using cloud and machine learning-based tools to achieve our goals.
SpeechShifter is a Chrome Extension that automatically adjusts the speed of videos depending on the language difficulty of each sentence in the video. With this Chrome Extension, English-As-A-Second-Language students are able to watch everyday/popular English videos at a pace comfortable to them without feeling overwhelmed or hopeless due to the speed and complexity of said content.
Steps:
- The Chrome Extension retrieves the URL of the Youtube video.
- This URL is sent to a web server which downloads the video and converts it into an audio file.
- This audio file is then stored in the Google Cloud Platform and is converted to text.
- The text is then plugged into a readability-score function which returns the difficulty of each sentence.
- Using this difficulty score, the video is slowed down when the language gets “difficult”.
- The front-end Chrome Extension is written in JavaScript. This retrieves the URL of the YouTube video
- This URL is then sent to a web server written in Flask. This web server is a wrapper of the command line tool YouTubeDL, a downloader that also converts video files to audio files.
- This audio file is then stored in a Google Cloud Platform based ‘bucket.’ Using the Google Cloud Platform’s Cloud Video Intelligence and Speech-to-Text APIs, the audio file is converted to a punctuated text.
- The text is sent to the readability-score function which was created using an open-source Natural Language Processing Python library called Spacy. Utilizing linguistics research centered around phrase/sentence difficulty, the model was created.
- Then the text file is run through the model to assess the “difficulty” of each sentence.
There were multiple roadblocks through the journey of SpeechShifter
- None of our team members worked with Chrome Extensions before, so we had to learn that from scratch.
- Initially, the web server for downloading and converting video files and the code to query the Google Cloud Platform APIs was written in JavaScript. However, we had trouble integrating that with the rest of our Flask backend which made us decide to scrap the JS script. In the end, we used Flask for everything other than for the front-end of the Chrome Extension.
- There were multiple bugs when creating the initial web server which downloaded the videos
- Creating a tool that has the potential to positively impact the educational and social wellbeing of people around the world.
- Completing features in tools we haven’t used before.
- Being alive at 5 am.
- Basic JS scripting.
- Creating a Chrome Extension.
- Natural Language Processing analysis
- Support for multiple languages
- Live machine translation for captions to follow along with native and foreign language
- Smooth transitions between phrases/sentences if they are at differing speeds
Thank you to HackMIT staff, mentors, and sports for the amazing event!
- James Packard