aliceebaird's Stars
ml-explore/mlx
MLX: An array framework for Apple silicon
m-bain/whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
NVIDIA/NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
facebookresearch/ImageBind
ImageBind One Embedding Space to Bind Them All
SkalskiP/courses
This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI)
wookayin/gpustat
📊 A simple command-line utility for querying and monitoring GPU status
linto-ai/whisper-timestamped
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
jik876/hifi-gan
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
DmitryUlyanov/Multicore-TSNE
Parallel t-SNE implementation with Python and Torch wrappers.
csteinmetz1/ai-audio-startups
Community list of startups working with AI in audio and music technology
Kyubyong/g2p
g2p: English Grapheme To Phoneme Conversion
csteinmetz1/pyloudnorm
Flexible audio loudness meter in Python with implementation of ITU-R BS.1770-4 loudness algorithm
shahules786/mayavoz
Pytorch based speech enhancement toolkit.
Jiaxin-Ye/TIM-Net_SER
[ICASSP 2023] Official Tensorflow implementation of "Temporal Modeling Matters: A Novel Temporal Emotional Modeling Approach for Speech Emotion Recognition".
neonbjb/tts-scores
Scripts for computing the Intelligibility and CLVP scores for evaluating TTS models
afourast/avobjects
Implementation for ECCV20 paper "Self-Supervised Learning of audio-visual objects from video"
HumeAI/hume-python-sdk
Python client for Hume AI
iariav/End-to-End-VAD
an Audio-Visual Voice Activity Detection using Deep Learning
HumeAI/competitions
Hume AI ML Competitions
EIHW/MuSe-2023
facebookresearch/emphassess
This repository presents an evaluation framework for speech-to-speech (S2S) models, following the methodology described in the EmphAsses paper (de Seyssel et al., 2023).
lstappen/MuSe-Toolbox
A Phyton toolbox to fuse multiple continuous emotion annotations from several raters and diarization them to classes!
dusty-phillips/similar-sounding-words
A list of similar sounding words to help disambiguate voice coding
zaocan666/DyViSE
Dynamic vision-guided speaker embedding for audio-visual speaker diarization
felixbur/syntAct
Scripts to generate a database of simulated emotional expression.
idiap/ExVo-2022
Extracting pre-trained self-supervised embeddings for ICML ExVO 2022 challenge
EIHW/ComParE2023
EIHW/prototypical-network-audio-evaluation
nfb-onf/sound-of-laughter
aliceebaird/temp_blanket