CAT (Conversation Analytics Toolkit) is a versatile tool for analyzing and summarizing audio conversations. It utilizes Whisper-X
, to transcribe and diarize spoken content. After that, it performs analytic tasks on this transcript, using a large language model (currently GPT-only). With CAT, you can quickly extract key information, action items, and even analyze the mood of the conversation.
-
Audio Transcription: CAT can transcribe audio files, making them accessible and searchable as text.
-
Diarization: The tool can identify and label speakers in a conversation, providing insights into who said what.
-
Information Extraction: CAT extracts important information, including summaries, key points, and action items, from the transcribed content.
-
Emotion Analysis: It also offers sentiment and mood analysis, helping you gauge the overall tone of the conversation.
-
Web Interface: CAT provides a user-friendly web interface for easy interaction and analysis.
-
REST API: CAT includes a REST endpoint that allows you to programmatically upload audio files for automatic analysis. This feature is particularly useful when integrated with VoIP systems like Asterisk or FreeSWITCH for automatic call analytics.
- OpenAI API Key
- HuggingFace API Token
- Accepted EULA for:
To run CAT locally, follow these steps:
-
Clone this repository:
git clone https://github.com/yourusername/CAT.git cd CAT
-
Set up the environment variables by creating a
.env
file in the root directory:OPENAI_API_KEY=your_openai_api_key GPT_MODEL=gpt-3.5-turbo-16k DEVICE=cpu # Change to "cuda" if you have a GPU BATCH_SIZE=16 COMPUTE_TYPE=int8 # Change to "float16" if needed HF_TOKEN=your_huggingface_api_token WHISPERX_MODEL=large-v2 LANGUAGE_CODE=de # Change to your desired language code NUM_THREADS=4
Replace
your_openai_api_key
andyour_huggingface_api_token
with your actual API keys. -
Install the required Python packages:
pip install -r requirements.txt
-
Start the 🐈 server:
python app.py
-
Access the web interface at
http://localhost:8080
in your web browser.
This repo also provides Docker containers for both CPU and GPU environments. Here's how to run 🐈 within a container:
To run CAT in a CPU-only Docker container, use the following steps:
-
Build the Docker image:
docker build -t cat-cpu -f Dockerfile .
-
Run the Docker container:
docker run -d --env-file=.env -p 8080:8080 -v /path/to/your/audio/files:/data cat-cpu
To run CAT in a GPU-accelerated Docker container, use the following steps:
-
Make sure, NVIDIA Container Toolkit is installed on the host.
-
Build the Docker image:
docker build -t cat-gpu -f Dockerfile.gpu .
-
Run the Docker container with GPU support:
docker run --gpus all -d --env-file=.env -p 8080:8080 -v /path/to/your/audio/files:/data cat-gpu
Replace /path/to/your/audio/files
with the directory where your audio files are located.
-
Web Interface: You can use the web interface to manually upload and analyze audio recordings. This is the easiest way to get started with 🐈.
-
REST API: CAT offers a REST API endpoint that allows you to programmatically upload audio files for automatic analysis. To use this feature, you can send a
POST
orPUT
request to the/
endpoint. The uploaded audio will be processed automatically, and the analysis results will be made available via the web interface. This is particularly useful when integrating CAT with VoIP systems for automatic call analytics
Here's an example of how to use the REST API endpoint:
```shell
# Replace <filename> with your desired filename
curl -X POST -F "file=@/path/to/your/audio/file.wav" http://localhost:8080/<filename>
```
.
-
Monitor Progress: CAT will process the uploaded audio in an async manner and display its progress on the web interface. You can check the status and see how many analytic threads are running.
-
View Results: Once processing is complete, you can view the transcribed text, diarized conversation, and extracted information. CAT also provides a summary of the conversation's key points and mood analysis.
-
Download: You can download the transcribed text and summary for your records.
You can customize the analysis and output by modifying the relevant functions in the app.py
file, as explained in the README.
CAT relies on the following technologies and libraries:
-
OpenAI GPT: For text generation and summarization.
-
Whisper-X: For audio transcription and diarization.
-
Flask: For the web application framework.
-
Other Python libraries as listed in
requirements.txt
.
Feel free to contribute to 🐈 by creating issues or submitting pull requests. Your contributions are welcome and appreciated.
🐈 is open-source software licensed under the MIT License. See the LICENSE file for details.