A Flask-based web application that transcribes speech to text and enhances it using AI. The application supports both OpenAI's Whisper API and local Whisper model for transcription, with special optimizations for macOS.
- Real-time audio recording through web browser
- Dual transcription options:
- OpenAI Whisper API (requires API key, faster)
- Local Whisper model (offline capable, no API costs)
- Multiple Whisper model options (base, small, medium, large, turbo)
- Text enhancement options:
- OpenAI GPT-4 (requires API key)
- Local LLM via Ollama (offline capable, no API costs)
- Support for multiple output formats:
- WhatsApp messages
- AI prompts
- General text
- Clean and responsive web interface
- Detailed logging system
- macOS (tested on Sonoma 14.0+)
- Python 3.10
- Homebrew (for installing ffmpeg)
- OpenAI API key (optional, for OpenAI services)
- Ollama (optional, for local LLM capabilities)
-
Install Homebrew (if not already installed):
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
-
Install ffmpeg (required for local Whisper model):
brew install ffmpeg
-
Install Ollama (optional, for local LLM capabilities):
brew install ollama
-
Download and run the Llama2 model (if using local LLM):
ollama pull llama2
-
Clone the repository:
git clone https://github.com/yourusername/stgt.git cd stgt
-
Create and activate a Python virtual environment:
python3 -m venv .venv source .venv/bin/activate
-
Install dependencies:
pip install -r requirements.txt
-
Set up environment variables:
cp .env.example .env
Then edit
.env
and add your OpenAI API key:OPENAI_API_KEY=your_api_key_here
-
Generate SSL certificate (required for microphone access):
openssl req -x509 -newkey rsa:4096 -nodes -out cert.pem -keyout key.pem -days 365
When prompted for "Common Name", enter
localhost
-
Start the Flask server:
python app.py
-
Access the application:
- Open your browser and go to
https://localhost:5001
- Accept the self-signed certificate warning (this is safe for local development)
- Open your browser and go to
-
Choose your transcription settings:
- Language: Select target language (default is Italian)
- Output Format: Choose between email, WhatsApp, AI prompt, or general text
- Transcription Model:
- Remote (OpenAI API) - Faster, requires internet
- Local (Whisper) - Works offline, runs on your machine
- Text Enhancement:
- Remote (OpenAI GPT) - More powerful, requires API key
- Local (Llama2) - Works offline, runs on your machine
-
Record your speech:
- Click "Start Recording"
- Speak into your microphone
- Click "Stop Recording" when done
-
View Results:
- Original transcription will appear
- Enhanced/translated text will follow
- Use the copy button to copy the enhanced text
When using the local Whisper model, you can choose between different model sizes:
base
: Fastest, lowest accuracysmall
: Good balance for general usemedium
: Better accuracy, slowerlarge
: Best accuracy, requires more memoryturbo
: Latest model, optimized for speed
To change the model, modify model_size
in src/services/transcription.py
:
self.model = whisper.load_model("base") # Change "base" to your preferred model
The application supports two modes for text enhancement:
- Requires API key
- More powerful and accurate
- Supports all languages
- Faster processing
- Completely offline operation
- No API costs
- Uses Llama2 model by default
- Can be customized with different models
- Processing speed depends on your hardware
To use a different Ollama model, modify the model name in src/services/text_enhancement.py
:
self.model_name = "llama2" # Change to your preferred model
Available models can be listed using:
ollama list
The application provides two levels of logging:
- Console: Shows main steps and timing in color-coded format
- File: Detailed debug information in
app.log
Log files rotate automatically when they reach 1MB, keeping up to 10 backup files.
-
Microphone Access Issues:
- Ensure you've granted browser permission for microphone
- Check that you're using HTTPS (required for microphone access)
- Try restarting your browser if permissions don't appear
-
Certificate Warnings:
- The warning about self-signed certificate is normal in development
- Click "Advanced" and "Proceed to localhost" in your browser
-
Local Whisper Model Issues:
- Verify ffmpeg is installed:
brew list ffmpeg
- Check Python environment is activated
- Ensure enough disk space for model downloads
- Verify ffmpeg is installed:
-
OpenAI API Issues:
- Verify your API key in
.env
- Check your OpenAI account has available credits
- Ensure internet connectivity
- Verify your API key in
project_root/
├── src/ # Source code directory
│ ├── config/ # Configuration modules
│ ├── services/ # Business logic
│ ├── routes/ # API routes
│ └── utils/ # Utility functions
├── static/ # Static files (CSS, JS)
├── templates/ # HTML templates
└── app.py # Main application entry point
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
MIT License
The application supports two audio input sources:
- Microphone: Direct microphone input for voice recording
- System Audio: Capture system audio output using BlackHole
To capture system audio, you'll need to install and configure BlackHole:
-
Install BlackHole:
brew install blackhole-2ch
-
Configure System Audio:
- Open System Settings > Sound
- Set the output device to "BlackHole 2ch"
- Any audio played on your system will now be captured by the application
-
Using System Audio Capture:
- Select "System Audio (BlackHole)" from the audio source dropdown
- Start recording to capture system audio
- Switch back to your regular output device to hear the audio
Note: When using system audio capture, make sure your system's audio is playing and BlackHole is properly configured as the output device.