/TalkSee-streamlit

🗣 ⇢ TalkSee ⇢ 👀 is a speech-to-text application that allows users to transcribe audio files or microphone input using the WhisperAI ASR models.

Primary LanguagePython

🗣 ⇢ TalkSee ⇢ 👀

Streamlit App Copyright MIT License

Software Design Document (SDD)


🗣 ⇢ Table o'Contents ⇢ 👀


🗣 ⇢ TalkSee ⇢ 👀 is a speech-to-text application that allows users to transcribe audio files or microphone input using the WhisperAI ASR models.


GUI

The graphical user interface is powered by Streamlit.

Model Selection

Provides a GUI to to select a WhisperAI ASR model.

Audio Input

Supports two modes of audio input: microphone input and file upload.

Speech Recognition/Transcription

Employs a WhisperAI ASR model to transcribe the user audio input into text.

Text Output

Displays the transcribed text to the user.


The 🗣 ⇢ TalkSee ⇢ 👀 web app relies on the following external libraries and resources:

  • Python 3.x

  • os: Provides operating system interface.

  • time: Provides time functionality.

  • io: Provides input/output functionality.



  1. Clone the repository:
gh repo clone PedroZappa/TalkSee
  1. Change the current directory to the cloned repository:
cd TalkSee
  1. Install the required packages from the requirements.txt file:
pip install -r requirements.txt
  1. Create a .streamlit/secrets.toml file and add the desired path to MODELS_PATH variable:
touch .streamlit/secrets.toml | echo 'MODELS_PATH="models"' >> .streamlit/secrets.toml
  1. Run Streamlit application:
streamlit run main.py

  • Streamlit-based user interface for easy interaction.

  • Select WhisperAI ASR model from the list of available models:


Size Parameters Multilingual model Required VRAM Relative speed
tiny 39 M tiny ~1 GB ~32x
base 74 M base ~1 GB ~16x
small 244 M small ~2 GB ~6x
medium 769 M medium ~5 GB ~2x
large 1550 M large ~10 GB 1x
  • Checks if CUDA is available for GPU processing, else runs on CPU.

  • Support for both microphone input and audio file upload.

  • Display of the transcribed text to the user.


  1. Select WhisperAI ASR model from the available options.
  2. Choose an input mode (Mic or File).
    • If using the Mic, click the "microphone-icon" button to start recording audio. The recording will stop automatically after 2 seconds of silence.
    • If using File, upload an audio file in .wav, .mp3 or .m4a formats.
  3. Click the Transcribe button to transcribe the audio file.
  4. Display transcribed text in "Transcription" section.

Some possible future enhancements for 🗣 ⇢ TalkSee ⇢ 👀 include:

  • Support for mobile devices.

  • Support for additional speech recognition models.

  • Real-time transcription of live audio input.

  • Integration with cloud storage services for seamless file upload and storage.

  • Improved error handling and user feedback.

  • Generate an image with the transcribed text as a prompt.



BACK TO TOP