wszlong

Research Interests: Nature/Spoken Language Processing, MT, ASR, Pre-training, and Deep Learning.

Microsoft Research AsiaBeijing, China

wszlong's Stars

triton-inference-server/server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Language:Python8.3k1.5k
triton-inference-server/tensorrtllm_backend
The Triton TensorRT-LLM Backend
Language:Python706106
opendilab/CleanS2S
High-quality and streaming Speech-to-Speech interactive agent in a single file. 只用一个文件实现的流式全双工语音交互原型智能体！
Language:Python22919
snakers4/silero-vad
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
Language:Python4.4k428
THUDM/GLM-4-Voice
GLM-4-Voice | 端到端中英语音对话模型
Language:Python2.3k184
SWivid/F5-TTS
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Language:Python7.3k864
kyutai-labs/moshi
Language:Python6.8k532
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python30.4k4.6k
hijkzzz/Awesome-LLM-Strawberry
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 and reasoning techniques.
5.2k286
VITA-MLLM/VITA
✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM
Language:Python96659
hubertsiuzdak/snac
Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate
Language:Python43826
jishengpeng/WavTokenizer
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
Language:Python79243
FunAudioLLM/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Language:Python6.4k678
BytedanceSpeech/seed-tts-eval
Language:Python1k104
Stability-AI/stable-audio-metrics
Metrics for evaluating music and audio generative models – with a focus on long-form, full-band, and stereo generations.
Language:Python15016
microsoft/Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
Language:Python1.9k343
NVIDIA/NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Language:Python12.2k2.5k
linto-ai/whisper-timestamped
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
Language:Python2k156
m-bain/whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Language:Python12.5k1.3k
Yuan-ManX/ai-audio-datasets
AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio applications.
53636
modelscope/FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Language:Python7k746
kan-bayashi/ParallelWaveGAN
Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch
Language:Jupyter Notebook1.6k343
QwenLM/Qwen2-Audio
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
Language:Python1.2k83
facebookresearch/AudioDec
An Open-source Streaming High-fidelity Neural Audio Codec
Language:Python44421
open-mmlab/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
Language:Jupyter Notebook7.7k576
v-iashin/SpecVQGAN
Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)
Language:Jupyter Notebook34740
bootphon/phonemizer
Simple text to phones converter for multiple languages
Language:Python1.2k174
TencentARC/Open-MAGVIT2
Open-MAGVIT2: Democratizing Autoregressive Visual Generation
Language:Python70529
wenet-e2e/WenetSpeech
A 10000+ hours dataset for Chinese speech recognition
Language:Shell50649
2noise/ChatTTS
A generative speech model for daily dialogue.
Language:Python32.5k3.5k

wszlong

wszlong's Stars

triton-inference-server/server

triton-inference-server/tensorrtllm_backend

opendilab/CleanS2S

snakers4/silero-vad

THUDM/GLM-4-Voice

SWivid/F5-TTS

kyutai-labs/moshi

vllm-project/vllm

hijkzzz/Awesome-LLM-Strawberry

VITA-MLLM/VITA

hubertsiuzdak/snac

jishengpeng/WavTokenizer

FunAudioLLM/CosyVoice

BytedanceSpeech/seed-tts-eval

Stability-AI/stable-audio-metrics

microsoft/Megatron-DeepSpeed

NVIDIA/NeMo

linto-ai/whisper-timestamped

m-bain/whisperX

Yuan-ManX/ai-audio-datasets

modelscope/FunASR

kan-bayashi/ParallelWaveGAN

QwenLM/Qwen2-Audio

facebookresearch/AudioDec

open-mmlab/Amphion

v-iashin/SpecVQGAN

bootphon/phonemizer

TencentARC/Open-MAGVIT2

wenet-e2e/WenetSpeech

2noise/ChatTTS