DongChanS's Stars
myshell-ai/OpenVoice
Instant voice cloning by MyShell.
facebookresearch/audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
HumanSignal/label-studio
Label Studio is a multi-type data labeling and annotation tool with standardized output format
microsoft/nni
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
facebookresearch/seamless_communication
Foundational Models for State-of-the-Art Speech and Text Translation
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.
Vaibhavs10/insanely-fast-whisper
pyannote/pyannote-audio
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
google/lyra
A Very Low-Bitrate Codec for Speech Compression
facebookincubator/cinder
Cinder is Meta's internal performance-oriented production version of CPython.
huggingface/distil-whisper
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
microsoft/LMOps
General technology for enabling AI capabilities w/ LLMs and MLLMs
hollobit/GenAI_LLM_timeline
ChatGPT, GenerativeAI and LLMs Timeline
lhotse-speech/lhotse
Tools for handling speech data in machine learning projects.
alibaba-damo-academy/3D-Speaker
A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization
facebookresearch/fairseq2
FAIR Sequence Modeling Toolkit 2
LAION-AI/audio-dataset
Audio Dataset for training CLAP and other models
NomaDamas/KICE_slayer_AI_Korean
수능 국어 1등급에 도전하는 AI
EmulationAI/awesome-large-audio-models
Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.
NomaDamas/awesome-korean-llm
Awesome list of Korean Large Language Models.
ZhangXInFD/SpeechTokenizer
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
facebookresearch/SONAR
SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders.
sp-uhh/storm
StoRM: A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation
IDRnD/VoxTube
The VoxTube dataset official repository
MagicHub-io/MagicData-RAMC
MagicData-RAMC Dataset and Baseline
Open-Speech-EkStep/ULCA-asr-dataset-corpus
huckiyang/awesome-neural-reprogramming-prompting
A curated list of awesome adversarial reprogramming and input prompting methods for neural networks since 2022
tal-z/SoundsLike
A python package for finding words that sound like other words. Useful for entity resolution and poetry, among other things.
actionpower/google_cloud_storage
Deno Library to upload files to GCS and obtain signed url
Adel-Moumen/fast_sligru