classifAI-engine: A Python repository from ClassifAI

ClassifAI Engine

ClassifAI engine is a RESTful API that provides the heavy lifting for classifAI through audio transcription, question categorization, and insights.

Explore the docs »

Visit Portal · Report Bug · Request Feature · Project Information

Table of Contents

About The Project
- Built With
Getting Started
- Prerequisites
- Installation
Usage
Roadmap
Contributing
License
Contact
Acknowledgments

About The Project

ClassifAI engine provides the heavy lifting for classifAI. It is a RESTful API that provides the following services:

Transcription of video and audio into text
Categorization of questions
Engagement insights
Turning reports into PDFs or .docx files

Built With

Getting Started

To get a local copy up and running follow these simple steps.

For more instructions please see the documentation

Prerequisites

Python 3.10 (can probably work with 3.9+ but not tested)
Redis
Huggingface API key token

Installation

Clone the repo

git clone https://github.com/TCU-ClassifAI/classifAI-engine.git
cd classifAI-engine

Install and run Redis

 sudo apt-get install redis-server 
 redis-server

Install Python packages (it is reccomended you use a venv)

pip install -r src/requirements.txt -r src/requirements-dev.txt

Include your huggingface API key in either your environment variables or in a .env file in the root directory
```
HF_TOKEN=your_key_here
```
You must also accept the Hugging Face terms and conditions:
- visit hf.co/pyannote/speaker-diarization-3.1 and accept user conditions
- visit hf.co/pyannote/segmentation-3.0 and accept user conditions
Launch the API
```
 python src/app.py
```
Launch your worker (for asynchronous tasks)
```
 rq worker -c config.worker_config
```
(More information on RQ)
Include your preferred summarization/categorization model through config.py (optional)
```
 SUMMARIZATION_MODEL = "gpt4"
 CATEGORIZATION_MODEL = "gemma"
```
You must launch your own model separely and include the API endpoint in your .env file
```
 LLAMA_API=your_model_endpoint
```

For Llama, please see the repository here

Testing

curl http://localhost:5000/healthcheck should return OK

Usage

Analyze an Audio File

analyze

URL: /analyze
Method: POST
Data Params:
- file (file)
- url (string)

Example:

curl -X POST -H "Content-Type: application/json" -d '{"url": "https://www.youtube.com/watch?v=t4yWEt0OSpg"}' http://localhost:5000/analyze
curl -X POST -F "file=@<path_to_your_audio_file>" http://localhost:5000/analyze

Success Response: 200

{
  "job_id": "0bc133cb-f519-40a1-96c6-46d2cfe9e4ad",
  "message": "Analysis started"
}

Get Analysis Status

URL: /analyze/<job_id>
Method: GET

Example (preferred):

curl http://localhost:5000/analyze/0bc133cb-f519-40a1-96c6-46d2cfe9e4ad

Alternative Example (legacy support):

curl http://localhost:5000/analyze/?job_id=0bc133cb-f519-40a1-96c6-46d2cfe9e4ad

Success Response: 200
Example Content:

{
  "meta": {
    "job_id": "0bc133cb-f519-40a1-96c6-46d2cfe9e4ad",
    "job_type": "analyze",
    "message": "Analysis finished",
    "progress": "finished",
    "title": "General Relativity Explained in 7 Levels of Difficulty"
  },
  "result": {
    "transcript": [
      {
        "end_time": 11149,
        "speaker": "Speaker 0",
        "start_time": 7740,
        "text": "General relativity is a physics theory invented by Albert Einstein. "
      },
    ],
    "questions": [
      {
        "question": "What is general relativity?",
        "level": 1,
      },
    ],
    "summary": "General relativity is a physics theory invented by Albert Einstein. It describes how gravity works in the universe. "
  }
}

For more examples, please refer to the Documentation

Summarization

summarize

URL: /summarize
Method: POST
Data Params:
- text (string)

Example:

curl -X POST -H "Content-Type: application/json" -d '{"text": "This is the transcript that I want to have summarized."}' http://localhost:5000/summarize/

Success Response: 200 OK

Content:

"This is the summary of the text that was passed in.", 200

Alternatively, you can pass in a transcript like so:

[
  {
    "end_time": 2301,
    "speaker": "Speaker 0",
    "start_time": 1260,
    "text": "Why did you bring me here? "
  },
  {
    "end_time": 4263,
    "speaker": "Main Speaker",
    "start_time": 3242,
    "text": "I dont like going out. "
  }
]

So the request would look like this:

curl -X POST -H "Content-Type: application/json" -d '{"transcript": [{"end_time": 2301,"speaker": "Speaker 0","start_time": 1260,"text": "Why did you bring me here? "},{"end_time": 4263,"speaker": "Main Speaker","start_time": 3242,"text": "I dont like going out. "}]}' http://localhost:5000/summarize/

Roadmap

See the open issues for a full list of proposed features (and known issues).

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

Instructions for Contribution

License

This project is licensed under the GNU GPLv3 License - see the LICENSE file for details. The GNU GPLv3 License is a free, copyleft license for software and other kinds of works.

Note that this license only applies to the engine. Please see the classifAI portal for more information on the license for the portal.

Contact

Learn About the Team

Project Link: https://github.com/TCU-ClassifAI/classifAI

View the Portal: https://classifai.tcu.edu/

Acknowledgments

TCU Computer Science Department, for funding this project
Our Clients, for providing us with the opportunity to work on this project and continued support
Dr. Bingyang Wei, for being our faculty advisor

(back to top)