xiangxyq's Stars
openai/whisper
Robust Speech Recognition via Large-Scale Weak Supervision
google/googletest
GoogleTest - Google Testing and Mocking Framework
svc-develop-team/so-vits-svc
SoftVC VITS Singing Voice Conversion
microsoft/mimalloc
mimalloc is a compact general purpose allocator with excellent performance.
jemalloc/jemalloc
facebookresearch/demucs
Code for the paper Hybrid Spectrogram and Waveform Source Separation
onnx/models
A collection of pre-trained, state-of-the-art models in the ONNX format
jbeder/yaml-cpp
A YAML parser and emitter in C++
modelscope/ms-swift
Use PEFT or Full-parameter to finetune 400+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)
snakers4/silero-vad
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
intel/neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
marzer/tomlplusplus
Header-only TOML config file parser and serializer for C++17.
nemtrif/utfcpp
UTF-8 with C++ in a Portable Way
QwenLM/Qwen-Audio
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
MontrealCorpusTools/Montreal-Forced-Aligner
Command line utility for forced alignment using Kaldi
QwenLM/Qwen2-Audio
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
lhotse-speech/lhotse
Tools for handling speech data in machine learning projects.
sooftware/kospeech
Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition leveraging PyTorch and Hydra.
KyleBing/english-vocabulary
英文单词,英语单词,英语四六级、考研、SAT单词,txt 文件, json 文件,CET4 CET6,乱序,单词
facebookresearch/AudioMAE
This repo hosts the code and models of "Masked Autoencoders that Listen".
qiuqiangkong/torchlibrosa
bytedance/uss
This is the PyTorch implementation of the Universal Source Separation with Weakly labelled Data.
kaituoxu/Listen-Attend-Spell
A PyTorch implementation of Listen, Attend and Spell (LAS), an End-to-End ASR framework.
Audio-WestlakeU/audiossl
A library built for easier audio self-supervised training, downstream tasks evaluation
Audio-WestlakeU/ATST-SED
This repo includes the official implementations of "Fine-tune the pretrained ATST model for sound event detection".
csukuangfj/kaldi-native-fbank
Kaldi-compatible online fbank extractor without external dependencies
lovemefan/fsmn-vad
A enterprise-grade Voice Activity Detector from modelscope and funasr.
cpuimage/shazam
a implementation of Shazam algorithm
RicherMans/SAT
Streaming Audiotransformers for online Audio tagging
Arshdeep-Singh-Boparai/E-PANNs