We recommend using Docker for reducing compatibility issues.
- Install Docker Engine and make sure to follow the post-install instructions. Otherwise, install Docker Desktop.
- If you have a GPU and want to use it to accelerate compute:
- Install NVIDIA CUDA Toolkit.
- Install NVIDIA Container Toolkit.
- Run the latest image:
export PSIFX_VERSION="0.0.2" export DATA_PATH="/path/to/data" docker run \ --user $(id -u):$(id -g) \ --gpus all \ --mount type=bind,source=$DATA_PATH,target=$DATA_PATH \ --interactive \ --tty \ guillaumerochette/psifx:$PSIFX_VERSION
- Check out
psifx
available commands!psifx --all-help
- Install the following system-wide:
sudo apt install ffmpeg ubuntu-restricted-extras
- Create a dedicated
conda
environment following the instructions in that order:conda create -y -n psifx-env python=3.9 pip conda activate psifx-env
- Now install
psifx
:pip install 'git+https://github.com/GuillaumeRochette/psifx.git'
- We provide an API endpoint to use OpenFace, useable only if you
comply with
their license agreement, e.g.
academic, research or non-commercial purposes.
- Install the following system-wide:
sudo apt install \ build-essential \ cmake \ wget \ libopenblas-dev \ libopencv-dev \ libdlib-dev \ libboost-all-dev \ libsqlite3-dev
- Install OpenFace using our fork.
wget https://raw.githubusercontent.com/GuillaumeRochette/OpenFace/master/install.py && \ python install.py
- Install the following system-wide:
psifx
is a Python package that can be used both as a library,
from psifx.audio.diarization.pyannote.tool import PyannoteDiarizationTool
# Parameterize a tool w/ specific settings, such as choosing the underlying neural network, etc.
tool = PyannoteDiarizationTool(...)
# Run the inference method on a given data, here it will be an audio track for example.
tool.inference(...)
But it can also come with its own CLI, that can be run directly in a terminal,
psifx audio diarization pyannote inference --audio /path/to/audio.wav --diarization /path/to/diarization.rttm
psifx audio manipulation extraction [-h] --video VIDEO --audio AUDIO
[--overwrite | --no-overwrite]
[--verbose | --no-verbose]
Tool for extracting the audio track from a video.
optional arguments:
-h, --help show this help message and exit
--video VIDEO path to the input video file, such as `/path/to/video.mp4` (or .avi, .mkv, etc.)
--audio AUDIO path to the output audio file, such as `/path/to/audio.wav`
--overwrite, --no-overwrite
overwrite existing files, otherwise raises an error (default: False)
--verbose, --no-verbose
verbosity of the script (default: True)
psifx audio manipulation conversion [-h] --audio AUDIO --mono_audio
MONO_AUDIO
[--overwrite | --no-overwrite]
[--verbose | --no-verbose]
Tool for converting any audio track to a mono audio track at 16kHz sample rate.
optional arguments:
-h, --help show this help message and exit
--audio AUDIO path to the input audio file, such as `/path/to/audio.wav` (or .mp3, etc.)
--mono_audio MONO_AUDIO
path to the output audio file, such as `/path/to/mono-audio.wav`
--overwrite, --no-overwrite
overwrite existing files, otherwise raises an error (default: False)
--verbose, --no-verbose
verbosity of the script (default: True)
psifx audio manipulation mixdown [-h] --mono_audios MONO_AUDIOS
[MONO_AUDIOS ...] --mixed_audio
MIXED_AUDIO
[--overwrite | --no-overwrite]
[--verbose | --no-verbose]
Tool for mixing multiple mono audio tracks.
optional arguments:
-h, --help show this help message and exit
--mono_audios MONO_AUDIOS [MONO_AUDIOS ...]
paths to the input mono audio files, such as `/path/to/mono-audio-1.wav /path/to/mono-audio-2.wav`
--mixed_audio MIXED_AUDIO
path to the output mixed audio file, such as `/path/to/mixed-audio.wav`
--overwrite, --no-overwrite
overwrite existing files, otherwise raises an error (default: False)
--verbose, --no-verbose
verbosity of the script (default: True)
psifx audio manipulation normalization [-h] --audio AUDIO
--normalized_audio
NORMALIZED_AUDIO
[--overwrite | --no-overwrite]
[--verbose | --no-verbose]
Tool for normalizing an audio track.
optional arguments:
-h, --help show this help message and exit
--audio AUDIO path to the input audio file, such as `/path/to/audio.wav`
--normalized_audio NORMALIZED_AUDIO
path to the output normalized audio file, such as `/path/to/normalized-audio.wav`
--overwrite, --no-overwrite
overwrite existing files, otherwise raises an error (default: False)
--verbose, --no-verbose
verbosity of the script (default: True)
psifx audio diarization pyannote inference [-h] --audio AUDIO
--diarization DIARIZATION
[--num_speakers NUM_SPEAKERS]
[--model_name MODEL_NAME]
[--api_token API_TOKEN]
[--device DEVICE]
[--overwrite | --no-overwrite]
[--verbose | --no-verbose]
Tool for diarizing an audio track with pyannote.
optional arguments:
-h, --help show this help message and exit
--audio AUDIO path to the input audio file, such as `/path/to/audio.wav`
--diarization DIARIZATION
path to the output diarization file, such as `/path/to/diarization.rttm`
--num_speakers NUM_SPEAKERS
number of speaking participants, if ignored the model will try to guess it, it is advised to specify it
--model_name MODEL_NAME
version number of the pyannote/speaker-diarization model, c.f. https://huggingface.co/pyannote/speaker-diarization/tree/main/reproducible_research
--api_token API_TOKEN
API token for the downloading the models from HuggingFace
--device DEVICE device on which to run the inference, either 'cpu' or 'cuda'
--overwrite, --no-overwrite
overwrite existing files, otherwise raises an error (default: False)
--verbose, --no-verbose
verbosity of the script (default: True)
psifx audio diarization pyannote visualization [-h] --diarization
DIARIZATION
--visualization
VISUALIZATION
[--overwrite | --no-overwrite]
[--verbose | --no-verbose]
Tool for visualizing the diarization of a track.
optional arguments:
-h, --help show this help message and exit
--diarization DIARIZATION
path to the input diarization file, such as `/path/to/diarization.rttm`
--visualization VISUALIZATION
path to the output visualization file, such as `/path/to/visualization.png`
--overwrite, --no-overwrite
overwrite existing files, otherwise raises an error (default: False)
--verbose, --no-verbose
verbosity of the script (default: True)
psifx audio identification pyannote inference [-h] --mixed_audio
MIXED_AUDIO --diarization
DIARIZATION --mono_audios
MONO_AUDIOS
[MONO_AUDIOS ...]
--identification
IDENTIFICATION
[--model_names MODEL_NAMES [MODEL_NAMES ...]]
[--api_token API_TOKEN]
[--device DEVICE]
[--overwrite | --no-overwrite]
[--verbose | --no-verbose]
Tool for identifying speakers from an audio track with pyannote.
optional arguments:
-h, --help show this help message and exit
--mixed_audio MIXED_AUDIO
path to the input mixed audio file, such as `/path/to/mixed-audio.wav`
--diarization DIARIZATION
path to the input diarization file, such as `/path/to/diarization.rttm`
--mono_audios MONO_AUDIOS [MONO_AUDIOS ...]
paths to the input mono audio files, such as `/path/to/mono-audio-1.wav /path/to/mono-audio-2.wav`
--identification IDENTIFICATION
path to the output identification file, such as `/path/to/identification.json`
--model_names MODEL_NAMES [MODEL_NAMES ...]
names of the embedding models
--api_token API_TOKEN
API token for the downloading the models from HuggingFace
--device DEVICE device on which to run the inference, either 'cpu' or 'cuda'
--overwrite, --no-overwrite
overwrite existing files, otherwise raises an error (default: False)
--verbose, --no-verbose
verbosity of the script (default: True)
psifx audio transcription whisper inference [-h] --audio AUDIO
--transcription
TRANSCRIPTION
[--language LANGUAGE]
[--model_name MODEL_NAME]
[--translate_to_english | --no-translate_to_english]
[--device DEVICE]
[--overwrite | --no-overwrite]
[--verbose | --no-verbose]
Tool for transcribing an audio track with Whisper.
optional arguments:
-h, --help show this help message and exit
--audio AUDIO path to the input audio file, such as `/path/to/audio.wav`
--transcription TRANSCRIPTION
path to the output transcription file, such as `/path/to/transcription.vtt`
--language LANGUAGE language of the audio, if ignore, the model will try to guess it, it is advised to specify it
--model_name MODEL_NAME
name of the model, check https://github.com/openai/whisper#available-models-and-languages
--translate_to_english, --no-translate_to_english
whether to transcribe the audio in its original language or to translate it to english (default: False)
--device DEVICE device on which to run the inference, either 'cpu' or 'cuda'
--overwrite, --no-overwrite
overwrite existing files, otherwise raises an error (default: False)
--verbose, --no-verbose
verbosity of the script (default: True)
psifx audio transcription whisper enhance [-h] --transcription
TRANSCRIPTION --diarization
DIARIZATION --identification
IDENTIFICATION
--enhanced_transcription
ENHANCED_TRANSCRIPTION
[--overwrite | --no-overwrite]
[--verbose | --no-verbose]
Tool for enhancing a transcription with diarization and identification.
optional arguments:
-h, --help show this help message and exit
--transcription TRANSCRIPTION
path to the input transcription file, such as `/path/to/transcription.vtt`
--diarization DIARIZATION
path to the input diarization file, such as `/path/to/diarization.rttm`
--identification IDENTIFICATION
path to the input identification file, such as `/path/to/identification.json`
--enhanced_transcription ENHANCED_TRANSCRIPTION
path to the output transcription file, such as `/path/to/enhanced-transcription.vtt`
--overwrite, --no-overwrite
overwrite existing files, otherwise raises an error (default: False)
--verbose, --no-verbose
verbosity of the script (default: True)
psifx audio speech opensmile inference [-h] --audio AUDIO --diarization
DIARIZATION --features FEATURES
[--feature_set FEATURE_SET]
[--feature_level FEATURE_LEVEL]
[--overwrite | --no-overwrite]
[--verbose | --no-verbose]
Tool for extracting non-verbal speech features from an audio track with OpenSmile.
optional arguments:
-h, --help show this help message and exit
--audio AUDIO path to the input audio file, such as `/path/to/audio.wav`
--diarization DIARIZATION
path to the input diarization file, such as `/path/to/diarization.rttm`
--features FEATURES path to the output feature archive, such as `/path/to/opensmile.tar.gz`
--feature_set FEATURE_SET
available sets: ['ComParE_2016', 'GeMAPSv01a', 'GeMAPSv01b', 'eGeMAPSv01a', 'eGeMAPSv01b', 'eGeMAPSv02', 'emobase']
--feature_level FEATURE_LEVEL
available levels: ['lld', 'lld_de', 'func']
--overwrite, --no-overwrite
overwrite existing files, otherwise raises an error (default: False)
--verbose, --no-verbose
verbosity of the script (default: True)
psifx video manipulation process [-h] --in_video IN_VIDEO --out_video
OUT_VIDEO [--start START] [--end END]
[--x_min X_MIN] [--y_min Y_MIN]
[--x_max X_MAX] [--y_max Y_MAX]
[--width WIDTH] [--height HEIGHT]
[--overwrite | --no-overwrite]
[--verbose | --no-verbose]
Tool for processing videos.
The trimming, cropping and resizing can be performed all at once, and in that order.
optional arguments:
-h, --help show this help message and exit
--in_video IN_VIDEO path to the input video file, such as `/path/to/video.mp4` (or .avi, .mkv, etc.)
--out_video OUT_VIDEO
path to the output video file, such as `/path/to/video.mp4` (or .avi, .mkv, etc.)
--start START trim: timestamp in seconds of the start of the selection
--end END trim: timestamp in seconds of the end of the selection
--x_min X_MIN crop: x-axis coordinate of the top-left corner in pixels
--y_min Y_MIN crop: y-axis coordinate of the top-left corner in pixels
--x_max X_MAX crop: x-axis coordinate of the bottom-right corner in pixels
--y_max Y_MAX crop: y-axis coordinate of the bottom-right corner in pixels
--width WIDTH resize: width of the resized output
--height HEIGHT resize: height of the resized output
--overwrite, --no-overwrite
overwrite existing files, otherwise raises an error (default: False)
--verbose, --no-verbose
verbosity of the script (default: True)
psifx video pose mediapipe inference [-h] --video VIDEO --poses POSES
[--masks MASKS]
[--mask_threshold MASK_THRESHOLD]
[--model_complexity MODEL_COMPLEXITY]
[--smooth | --no-smooth]
[--device DEVICE]
[--overwrite | --no-overwrite]
[--verbose | --no-verbose]
Tool for inferring human pose with MediaPipe Holistic.
optional arguments:
-h, --help show this help message and exit
--video VIDEO path to the input video file, such as `/path/to/video.mp4` (or .avi, .mkv, etc.)
--poses POSES path to the output pose archive, such as `/path/to/poses.tar.gz`
--masks MASKS path to the output segmentation mask video file, such as `/path/to/masks.mp4` (or .avi, .mkv, etc.)
--mask_threshold MASK_THRESHOLD
threshold for the binarization of the segmentation mask
--model_complexity MODEL_COMPLEXITY
complexity of the model: {0, 1, 2}, higher means more FLOPs, but also more accurate results
--smooth, --no-smooth
temporally smooth the inference results to reduce the jitter (default: True)
--device DEVICE device on which to run the inference, either 'cpu' or 'cuda'
--overwrite, --no-overwrite
overwrite existing files, otherwise raises an error (default: False)
--verbose, --no-verbose
verbosity of the script (default: True)
psifx video pose mediapipe visualization [-h] --video VIDEO --poses
POSES --visualization
VISUALIZATION
[--confidence_threshold CONFIDENCE_THRESHOLD]
[--overwrite | --no-overwrite]
[--verbose | --no-verbose]
Tool for visualizing the poses over the video.
optional arguments:
-h, --help show this help message and exit
--video VIDEO path to the input video file, such as `/path/to/video.mp4` (or .avi, .mkv, etc.)
--poses POSES path to the input pose archive, such as `/path/to/poses.tar.gz`
--visualization VISUALIZATION
path to the output visualization video file, such as `/path/to/visualization.mp4` (or .avi, .mkv, etc.)
--confidence_threshold CONFIDENCE_THRESHOLD
threshold for not displaying low confidence keypoints
--overwrite, --no-overwrite
overwrite existing files, otherwise raises an error (default: False)
--verbose, --no-verbose
verbosity of the script (default: True)
psifx video face openface inference [-h] --video VIDEO --features
FEATURES
[--overwrite | --no-overwrite]
[--verbose | --no-verbose]
Tool for inferring face features from videos with OpenFace.
optional arguments:
-h, --help show this help message and exit
--video VIDEO path to the input video file, such as `/path/to/video.mp4` (or .avi, .mkv, etc.)
--features FEATURES path to the output feature archive, such as `/path/to/openface.tar.gz`
--overwrite, --no-overwrite
overwrite existing files, otherwise raises an error (default: False)
--verbose, --no-verbose
verbosity of the script (default: True)
psifx video face openface visualization [-h] --video VIDEO --features
FEATURES --visualization
VISUALIZATION [--depth DEPTH]
[--f_x F_X] [--f_y F_Y]
[--c_x C_X] [--c_y C_Y]
[--overwrite | --no-overwrite]
[--verbose | --no-verbose]
Tool for visualizing face features from videos with OpenFace.
optional arguments:
-h, --help show this help message and exit
--video VIDEO path to the input video file, such as `/path/to/video.mp4` (or .avi, .mkv, etc.)
--features FEATURES path to the input feature archive, such as `/path/to/openface.tar.gz`
--visualization VISUALIZATION
path to the output video file, such as `/path/to/visualization.mp4` (or .avi, .mkv, etc.)
--depth DEPTH projection: assumed static depth of the subject in meters
--f_x F_X projection: x-axis of the focal length
--f_y F_Y projection: y-axis of the focal length
--c_x C_X projection: x-axis of the principal point
--c_y C_Y projection: y-axis of the principal point
--overwrite, --no-overwrite
overwrite existing files, otherwise raises an error (default: False)
--verbose, --no-verbose
verbosity of the script (default: True)
psifx video manipulation process --in_video Videos/Left.mp4 --out_video Videos/Left.processed.mp4 --start 18 --end 210 --x_min 1347 --y_min 459 --x_max 2553 --y_max 1898 --overwrite
psifx video manipulation process --in_video Videos/Right.mp4 --out_video Videos/Right.processed.mp4 --start 18 --end 210 --x_min 1358 --y_min 435 --x_max 2690 --y_max 2049 --overwrite
psifx audio manipulation extraction --video Videos/Left.mp4 --audio Audios/Left.wav
psifx audio manipulation extraction --video Videos/Right.mp4 --audio Audios/Right.wav
psifx audio manipulation mixdown --mono_audios Audios/Right.wav Audios/Left.wav --mixed_audio Audios/Mixed.wav
psifx audio manipulation normalization --audio Audios/Right.wav --normalized_audio Audios/Right.normalized.wav
psifx audio manipulation normalization --audio Audios/Left.wav --normalized_audio Audios/Left.normalized.wav
psifx audio manipulation normalization --audio Audios/Mixed.wav --normalized_audio Audios/Mixed.normalized.wav
psifx audio diarization pyannote inference --audio Audios/Mixed.normalized.wav --diarization Diarizations/Mixed.rttm --num_speakers 2 --device cuda
psifx audio identification pyannote inference --mixed_audio Audios/Mixed.normalized.wav --diarization Diarizations/Mixed.rttm --mono_audios Audios/Left.normalized.wav Audios/Right.normalized.wav --identification Identifications/Mixed.json --device cuda
psifx audio transcription whisper inference --audio Audios/Mixed.normalized.wav --transcription Transcriptions/Mixed.vtt --model_name large --language fr --device cuda
psifx audio transcription whisper enhance --transcription Transcriptions/Mixed.vtt --diarization Diarizations/Mixed.rttm --identification Identifications/Mixed.json --enhanced_transcription Transcriptions/Mixed.enhanced.vtt
psifx audio diarization visualization --diarization Diarizations/Mixed.rttm --visualization Visualizations/Mixed.png
psifx video pose mediapipe inference --video Videos/Right.mp4 --poses Poses/Right.tar.xz --masks Masks/Right.mp4
psifx video pose mediapipe inference --video Videos/Left.mp4 --poses Poses/Left.tar.xz --masks Masks/Left.mp4
psifx video face openface inference --video Videos/Right.mp4 --features Faces/Right.tar.xz
psifx video face openface inference --video Videos/Left.mp4 --features Faces/Left.tar.xz
psifx video pose mediapipe visualization --video Videos/Right.mp4 --poses Poses/Right.tar.xz --visualization Visualizations/Right.mediapipe.mp4
psifx video pose mediapipe visualization --video Videos/Left.mp4 --poses Poses/Left.tar.xz --visualization Visualizations/Left.mediapipe.mp4
psifx video face openface visualization --video Videos/Right.mp4 --features Faces/Right.tar.xz --visualization Visualizations/Right.openface.mp4
psifx video face openface visualization --video Videos/Left.mp4 --features Faces/Left.tar.xz --visualization Visualizations/Left.openface.mp4
export PSIFX_VERSION="0.0.2"
export HF_TOKEN="write-your-hf-token-here"
docker buildx build \
--build-arg PSIFX_VERSION=$PSIFX_VERSION \
--build-arg HF_TOKEN=$HF_TOKEN \
--tag "psifx:$PSIFX_VERSION" \
--push .