Pinned Repositories
espnet
End-to-End Speech Processing Toolkit
SenseVoice
Multilingual Voice Understanding Model
FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
ms-swift
Use PEFT or Full-parameter to finetune 400+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)
babel_kws
基于kaldi下的babel项目复杂环境下语音关键字检索
FunASR
A Fundamental End-to-End Speech Recognition Toolkit
Keyword-Transformer
Implementation of the paper "Keyword Transformer: A Self-Attention Model for Keyword Spotting"
kwa
mmrotate
OpenMMLab Rotated Object Detection Toolbox and Benchmark
whisper_streaming
Whisper realtime streaming for long speech-to-text transcription and translation
rookie0607's Repositories
rookie0607/babel_kws
基于kaldi下的babel项目复杂环境下语音关键字检索
rookie0607/FunASR
A Fundamental End-to-End Speech Recognition Toolkit
rookie0607/Keyword-Transformer
Implementation of the paper "Keyword Transformer: A Self-Attention Model for Keyword Spotting"
rookie0607/kwa
rookie0607/mmrotate
OpenMMLab Rotated Object Detection Toolbox and Benchmark