This project utilizes OpenAI's Whisper model for audio transcription, providing an efficient and accurate method to convert speech in audio files to text. This project has been implemented by using https://huggingface.co/openai/whisper-large-v3
Ensure you have Python installed on your system. This project has been set up and tested using the configurations and guidelines provided by the Hugging Face documentation and python 3.10.
- Clone the repository to your local machine.
- Navigate to the project directory.
- Install the required dependencies using the following command:
pip install -r requirements.txt
Note: Its recommened by the Huggingface page to install flash attention via the following
pip install flash-attn --no-build-isolation
Note for macOS Users: During installation, you may encounter issues with NVIDIA CUDA-related packages and bitsandbytes
. These packages are not supported on macOS and should be commented out in the requirements.txt
file before proceeding with the installation.
All configurations can be customized in the config.py
file. This includes changing the model by updating the model_id
in the model_params
section.
To transcribe an audio file, use the following command:
python main.py --audio-path /path/to/file
Ensure you replace /path/to/file
with the actual path to the audio file you wish to transcribe.
For more detailed information about configurations and usage, refer here.
We welcome contributions! Please feel free to submit pull requests or open issues to improve the project or fix problems.