YAWT is a Python-based audio transcription tool that uses OpenAI's Whisper model for converting audio files to text. It integrates speaker diarization via PyAnnote and supports multiple languages for transcription, making it adaptable to multilingual audio content. It also supports various output formats and upload services.
- Features
- Installation
- Configuration
- Usage
- Command-line Arguments
- Testing
- Contributing
- License
- Acknowledgments
- Building Executables
- Multilingual Transcription: Supports both primary and secondary languages for transcription, ensuring flexibility in handling multilingual audio files.
- Speaker Diarization: Identifies and separates speakers using PyAnnote for clear, speaker-specific transcription.
- Output Formats: Supports
text
,stj
, andsrt
formats. - STJ Output Format: Exports transcriptions in Standard Transcription JSON format for structured data interchange.
- Upload Services: Allows uploading audio files to services such as
0x0.st
andfile.io
. - Customizable Timeouts & Cost Estimation: Adjustable timeout settings and estimated costs based on usage.
- Logging: Provides configurable logging levels and log rotation for monitoring and debugging.
- Dry-Run Mode: Offers cost estimation without executing the transcription.
- Retry Mechanism: Automatically retries failed transcription segments.
- Flexible API Key Management: OpenAI and PyAnnote API keys can be specified via command-line arguments or environment variables.
- Python 3.10
- Poetry for managing dependencies
-
Clone the repository:
git clone https://github.com/yaniv-golan/YAWT.git cd YAWT
-
Install dependencies using Poetry:
poetry install
-
Set up environment variables (optional if not using command-line arguments for API keys): Create a
.env
file in the project root with the following keys:PYANNOTE_TOKEN=your_pyannote_api_token OPENAI_KEY=your_openai_api_key
YAWT uses a YAML-based configuration file to control various options. A sample configuration is available in the repository at config/example_config.yaml
.
To use this sample configuration:
cp config/example_config.yaml config/config.yaml
Edit config.yaml
to suit your requirements.
YAWT can be executed directly from the command line using the yawt
command without needing python
or poetry
prefixes, thanks to the entry point defined in pyproject.toml
.
yawt --input-file path/to/audio.wav --output-format text --main-language en
yawt --audio-url https://example.com/audio.mp3 --output-format srt --main-language en --num-speakers 3
yawt --input-file path/to/audio.wav --output-format stj --main-language en
The transcription will be saved in a file with the .stj.json
extension.
yawt --dry-run --input-file path/to/audio.wav
--audio-url
: Publicly accessible URL of the audio file to transcribe (required if no input file provided).--input-file
: Path to the local audio file to transcribe (required if no URL provided).--config
: Path to the configuration file.--context-prompt
: Context prompt to guide transcription.--main-language
: Main language of the audio (required).--secondary-language
: Secondary language of the audio for retry attempts.--num-speakers
: Number of speakers to detect (auto-detect if not specified).--dry-run
: Estimate costs without processing.--debug
: Enable debug logging.--verbose
: Enable verbose output.--pyannote-token
: Pyannote API token (overrides environment variable).--openai-key
: OpenAI API key (overrides environment variable).--model
: Specify the OpenAI Whisper model to use (default: "openai/whisper-large-v3", choices: "openai/whisper-large-v3" or "openai/whisper-large-v3-turbo").--output-format
: Desired output format(s): text, stj, srt (default: "text").-o
,--output
: Base path for output files (without extension).
Note: Either --audio-url
or --input-file
must be provided, but not both.
For more information about the STJ format, see the STJ Specification.
YAWT uses pytest
for testing. Run the test suite using the following command:
pytest
Ensure that all tests pass before submitting changes.
Contributions are welcome. To contribute:
- Fork the repository.
- Create a new feature branch (
git checkout -b feature/YourFeature
). - Commit your changes (
git commit -m 'Add feature'
). - Push to the branch (
git push origin feature/YourFeature
). - Open a Pull Request.
Please ensure that your code follows PEP8 guidelines, and add or update tests as necessary.
This project is licensed under the MIT License. See the LICENSE file for details.
Special thanks to Lior Atias for allowing the inclusion of our podcast conversation as a test file in this project. The episode is in Hebrew, with intermixed English terms, so --main-language he --secondary-language he
:
- Local test file:
samples/multi-lang/podcast - Lior Atias - Executive Summary/Lior Atias - Executive Summary - Oct 22, 2024 - with Yaniv Golan.mp3
- Original episode: Listen on Spotify
Build for the current platform:
python freeze.py build
The executable will be created in build/
directory with a platform-specific name:
- macOS ARM:
yawt-macos-arm64
- macOS Intel:
yawt-macos-x86_64
- Linux:
yawt-linux-x86_64
- Windows:
yawt-win-x86_64.exe
Note: For best compatibility, build on each target platform separately.