/openthot

Primary LanguagePython

OpenThot API

Description

OpenThot API is a python FastAPI that provides an interviews transcription tool, by standing on the shoulders of existing open-source ASR engines that also provide diarization (currently whisperX and wordcab-transcribe, feel free to contribute with yours 😉).

It basically adds a stateful layer so you can compute, store, view and modify the results in a unified way.

It can be combined with a frontend (such as OpenThot frontend).

Setup

Copy the default .env and secrets.env files

cp .env.example .env
cp secrets.env.example secrets.env

and modify them with your own credentials if needed (e.g. the HuggingFace token if you plan to use whisperX as ASR.)

Docker commands

First, load the .env file (we need the ASR__ENGINE variable) :

source .env

Then build the image :

docker build --build-arg ASR__ENGINE=${ASR__ENGINE} -t openthot_api:${ASR__ENGINE} .

Run the api container :

docker run -d --name openthot_api \
    -p 8000:8000 \
    --env-file .env \
    --env-file secrets.env \
    -v ./data:/usr/src/openthot/data \
    openthot_api:${ASR__ENGINE}

Run the worker container :

docker run -d --name openthot_worker \
    --env-file .env \
    --env-file secrets.env \
    -v ./data:/usr/src/openthot/data \
    openthot_api:${ASR__ENGINE} \
    celery --app openthot.tasks.tasks.celery worker

Run locally / contribute

1. Requirements:

  • poetry
  • python 3.11 (you can use pyenv to handle python versions)
  • direnv (optionnal)

2. Setup

Virtual environment

Go to project folder, then :

  • If direnv is installed : direnv allow.
  • If not :
    poetry shell
    source .env

direnv takes care of loading/unloading the virtual env and the .env file whenever you enter/leave the project folder. If you don't use direnv, remember to run poetry shell and source .env each time you want to install/run the project.

Installation
poetry install --only main,cli,${ASR__ENGINE} --no-cache --sync
For contributors 🚀
pre-commit install

# note the additionnal `dev` group
poetry install --only main,cli,dev,${ASR__ENGINE} --no-cache --sync

pytest -m "not slow"  # discard the slowest tests
pytest  # run all tests

3. Run

openthot --help
openthot standalone