xiangxyq

xiangxyq's Stars

openai/whisper
Robust Speech Recognition via Large-Scale Weak Supervision
Language:Python72.3k 588 08.6k
google/googletest
GoogleTest - Google Testing and Mocking Framework
Language:C++35k 1.2k 2.3k10.2k
svc-develop-team/so-vits-svc
SoftVC VITS Singing Voice Conversion
Language:Python26k 179 1304.8k
microsoft/mimalloc
mimalloc is a compact general purpose allocator with excellent performance.
Language:C10.6k 157 626873
jemalloc/jemalloc
Language:C9.6k 314 1.3k1.5k
facebookresearch/demucs
Code for the paper Hybrid Spectrogram and Waveform Source Separation
Language:Python8.4k 154 5431.1k
onnx/models
A collection of pre-trained, state-of-the-art models in the ONNX format
Language:Jupyter Notebook8k 189 3911.4k
jbeder/yaml-cpp
A YAML parser and emitter in C++
Language:C++5.2k 117 9051.9k
modelscope/ms-swift
Use PEFT or Full-parameter to finetune 400+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)
Language:Python4.5k 22 1.4k395
snakers4/silero-vad
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
Language:Python4.5k 49 244437
intel/neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
Language:Python2.2k 33 206257
marzer/tomlplusplus
Header-only TOML config file parser and serializer for C++17.
Language:C++1.6k 30 180152
nemtrif/utfcpp
UTF-8 with C++ in a Portable Way
Language:C++1.6k 52 62200
QwenLM/Qwen-Audio
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
Language:Python1.5k 25 67109
MontrealCorpusTools/Montreal-Forced-Aligner
Command line utility for forced alignment using Kaldi
Language:Python1.4k 36 720249
QwenLM/Qwen2-Audio
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
Language:Python1.3k 32 8485
lhotse-speech/lhotse
Tools for handling speech data in machine learning projects.
Language:Python957 43 421220
sooftware/kospeech
Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition leveraging PyTorch and Hydra.
Language:Python605 21 135192
KyleBing/english-vocabulary
英文单词，英语单词，英语四六级、考研、SAT单词，txt 文件, json 文件，CET4 CET6，乱序，单词
550 3 3107
facebookresearch/AudioMAE
This repo hosts the code and models of "Masked Autoencoders that Listen".
Language:Python549 32 2946
qiuqiangkong/torchlibrosa
Language:Python477 5 748
bytedance/uss
This is the PyTorch implementation of the Universal Source Separation with Weakly labelled Data.
Language:Python337 12 1216
kaituoxu/Listen-Attend-Spell
A PyTorch implementation of Listen, Attend and Spell (LAS), an End-to-End ASR framework.
Language:Python200 6 1856
Audio-WestlakeU/audiossl
A library built for easier audio self-supervised training, downstream tasks evaluation
Language:Python107 7 1210
Audio-WestlakeU/ATST-SED
This repo includes the official implementations of "Fine-tune the pretrained ATST model for sound event detection".
Language:Jupyter Notebook104 3 2513
csukuangfj/kaldi-native-fbank
Kaldi-compatible online fbank extractor without external dependencies
Language:C++80 4 1422
lovemefan/fsmn-vad
A enterprise-grade Voice Activity Detector from modelscope and funasr.
Language:Python66 3 47
cpuimage/shazam
a implementation of Shazam algorithm
Language:C45 6 118
RicherMans/SAT
Streaming Audiotransformers for online Audio tagging
Language:Python41 4 44
Arshdeep-Singh-Boparai/E-PANNs
Language:Python13 2 00