A text-based audio search engine developed in Python for UofTHacks VII. Submission available on Devpost.
Searching large quantities of information can be a very grueling, repetitive, and boring task. While there are tools that aim to streamline this process for text-based data (such as Ctrl+F), there is no good similar solution for audio and video based data. Our goal is to help minimize this problem by speeding up the time that it takes to search in audio files.
Syft is a web-based tool for searching phrases within audio recordings. It can extract both the exact timestamps where the phrase was uttered, along with the sentence containing it.
Syft is powered by Google Cloud Speech to Text recognition. The audio recordings are transcribed and then searched. Finally, the matches are cross-referenced with the transcription to determine the timestamps.
- Resolving search query matches regardless of punctuation, contractions, and other natural language features.
- Optimizing the backend, most notably the audio transcription pipeline, to speedily serve search results.
- Deploying the backend API server to Google App Engine.
- A clean and elegant frontend UI
- Relatively fast search queries
- Google Cloud App Engine creation and deployment; flexible environments with custom runtimes using Dockerfile.
- Providing an option to download search results as audio/video clips (either individually or as a supercut).
- Developing a custom speech-to-text model, removing the network latency that comes with Google's API, and thus most likely speeding up the application.
- Expanding the search engine to graphics as well (i.e. searching for text within frames of a video).