GuangkeChen's Stars
openai/whisper
Robust Speech Recognition via Large-Scale Weak Supervision
suno-ai/bark
🔊 Text-Prompted Generative Audio Model
svc-develop-team/so-vits-svc
SoftVC VITS Singing Voice Conversion
microsoft/JARVIS
JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf
microsoft/LoRA
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
facebookresearch/seamless_communication
Foundational Models for State-of-the-Art Speech and Text Translation
AIGC-Audio/AudioGPT
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
speechbrain/speechbrain
A PyTorch-based Speech Toolkit
facebookresearch/ImageBind
ImageBind One Embedding Space to Bind Them All
IDEA-Research/GroundingDINO
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
synthetichealth/synthea
Synthetic Patient Population Simulator
LAION-AI/CLAP
Contrastive Language-Audio Pretraining
lucidrains/soundstorm-pytorch
Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch
THUDM/ImageReward
[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation
k2-fsa/icefall
LAION-AI/audio-dataset
Audio Dataset for training CLAP and other models
PlayVoice/lora-svc
singing voice change based on whisper, and lora for singing voice clone
xinjli/allosaurus
Allosaurus is a pretrained universal phone recognizer for more than 2000 languages
Audio-AGI/WavJourney
WavJourney: Compositional Audio Creation with LLMs
magic-research/bubogpt
BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
M4Singer/M4Singer
xinjli/transphone
phoneme tokenizer and grapheme-to-phoneme model for 8k languages
lesterphillip/SVCC23_FastSVC
Singing Voice Conversion Challenge 2023 Starter Kit: FastSVC Reimplementation
dnn-security/Watermark-Robustness-Toolbox
The official implementation of the IEEE S&P`22 paper "SoK: How Robust is Deep Neural Network Image Classification Watermarking".
THU-KEG/ChatLog
⏳ ChatLog: Recording and Analysing ChatGPT Across Time
naver-ai/facetts
matthijsvk/TCDTIMITprocessing
processing and extracting of face and mouth image files out of the TCDTIMIT database
liuyoude/TWFR-GMM
Time-weighted Frequency Domain Audio Representation (TWFR) with GMM Estimator for Anomalous Sound Detection
Sreyan88/Toxicity-Detection-in-Spoken-Utterances
This repository contains the code for the paper: "DeToxy: A Large-Scale Multimodal Dataset for Toxicity Classification in Spoken Utterances"
kscanne/crubadan
Scripts and data for the Crúbadán web crawler: http://crubadan.org/