GuangkeChen

GuangkeChen's Stars

openai/whisper
Robust Speech Recognition via Large-Scale Weak Supervision
Language:Python74k 605 08.8k
suno-ai/bark
🔊 Text-Prompted Generative Audio Model
Language:Jupyter Notebook36.6k 332 4484.3k
svc-develop-team/so-vits-svc
SoftVC VITS Singing Voice Conversion
Language:Python26.3k 179 1304.9k
microsoft/JARVIS
JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf
Language:Python23.9k 381 1812k
microsoft/LoRA
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
Language:Python11.1k 70 108698
facebookresearch/seamless_communication
Foundational Models for State-of-the-Art Speech and Text Translation
Language:Jupyter Notebook11.1k 144 3701.1k
AIGC-Audio/AudioGPT
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
Language:Python10.1k 137 51868
speechbrain/speechbrain
A PyTorch-based Speech Toolkit
Language:Python9.2k 134 1.1k1.4k
facebookresearch/ImageBind
ImageBind One Embedding Space to Bind Them All
Language:Python8.5k 99 93782
IDEA-Research/GroundingDINO
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
Language:Python7.1k 45 311720
synthetichealth/synthea
Synthetic Patient Population Simulator
Language:Java2.2k 77 583665
LAION-AI/CLAP
Contrastive Language-Audio Pretraining
Language:Python1.5k 29 94148
lucidrains/soundstorm-pytorch
Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch
Language:Python1.5k 51 2291
THUDM/ImageReward
[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation
Language:Python1.2k 15 9465
k2-fsa/icefall
Language:Python975 48 688310
LAION-AI/audio-dataset
Audio Dataset for training CLAP and other models
Language:Python654 21 5954
PlayVoice/lora-svc
singing voice change based on whisper, and lora for singing voice clone
Language:Python631 24 6978
xinjli/allosaurus
Allosaurus is a pretrained universal phone recognizer for more than 2000 languages
Language:Python584 27 6787
Audio-AGI/WavJourney
WavJourney: Compositional Audio Creation with LLMs
Language:Python528 25 143
magic-research/bubogpt
BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
Language:Python506 10 1935
M4Singer/M4Singer
Language:Python198 10 1516
xinjli/transphone
phoneme tokenizer and grapheme-to-phoneme model for 8k languages
Language:Python151 13 1115
lesterphillip/SVCC23_FastSVC
Singing Voice Conversion Challenge 2023 Starter Kit: FastSVC Reimplementation
Language:Python113 7 1110
dnn-security/Watermark-Robustness-Toolbox
The official implementation of the IEEE S&P`22 paper "SoK: How Robust is Deep Neural Network Image Classification Watermarking".
Language:Python108 2 529
THU-KEG/ChatLog
⏳ ChatLog: Recording and Analysing ChatGPT Across Time
Language:Jupyter Notebook95 7 33
naver-ai/facetts
Language:Python50 2 95
matthijsvk/TCDTIMITprocessing
processing and extracting of face and mouth image files out of the TCDTIMIT database
Language:Python44 4 311
liuyoude/TWFR-GMM
Time-weighted Frequency Domain Audio Representation (TWFR) with GMM Estimator for Anomalous Sound Detection
Language:Python22 1 34
Sreyan88/Toxicity-Detection-in-Spoken-Utterances
This repository contains the code for the paper: "DeToxy: A Large-Scale Multimodal Dataset for Toxicity Classification in Spoken Utterances"
Language:Jupyter Notebook13 2 37
kscanne/crubadan
Scripts and data for the Crúbadán web crawler: http://crubadan.org/
Language:Python8 2 14