yaozengwei's Stars
ossrs/srs
SRS is a simple, high-efficiency, real-time media server supporting RTMP, WebRTC, HLS, HTTP-FLV, HTTP-TS, SRT, MPEG-DASH, and GB28181.
k2-fsa/sherpa-onnx
Speech-to-text, text-to-speech, speaker recognition, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter, Object Pascal, Lazarus, Rust
asteroid-team/asteroid
The PyTorch-based audio source separation toolkit for researchers
lifeiteng/vall-e
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
jeonsworld/ViT-pytorch
Pytorch reimplementation of the Vision Transformer (An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale)
k2-fsa/k2
FSA/FST algorithms, differentiable, with PyTorch compatibility.
k2-fsa/sherpa-ncnn
Real-time speech recognition and voice activity detection (VAD) using next-gen Kaldi with ncnn without Internet connection. Support iOS, Android, Linux, macOS, Windows, Raspberry Pi, VisionFive2, LicheePi4A etc.
lhotse-speech/lhotse
Tools for handling speech data in machine learning projects.
k2-fsa/icefall
Snowdar/asv-subtools
An Open Source Tools for Speaker Recognition
k2-fsa/sherpa
Speech-to-text server framework with next-gen Kaldi
alibaba-damo-academy/FunCodec
FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.
Amshaker/SwiftFormer
[ICCV'23] Official repository of paper SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications
csukuangfj/kaldifeat
Kaldi-compatible online & offline feature extraction with PyTorch, supporting CUDA, batch processing, chunk processing, and autograd - Provide C++ & Python API
k2-fsa/libriheavy
Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context
CorentinJ/librispeech-alignments
Word alignments generated by the Montreal Forced Aligner for the Librispeech dataset
csukuangfj/transducer-loss-benchmarking
k2-fsa/text_search
Some fast-ish algorithms for batch text search in moderate-sized collections, intended for data cleanup
danpovey/quantization
Torch-based tool for quantizing high-dimensional vectors using additive codebooks
k2-fsa/multi_quantization
csukuangfj/kaldilm
Python wrapper for kaldi's arpa2fst
xunguangwang/ProS-GAN
[CVPR 2021] Official repository for "Prototype-supervised Adversarial Network for Targeted Attack of Deep Hashing"
csukuangfj/kaldi-hmm-gmm
winlinvip/srs-k2
Apply https://github.com/k2-fsa/sherpa-ncnn in live streaming and WebRTC
k2-fsa/divide_lm
tuyanglin/Fingerprint-Restoration
Fingerprint Restoration using Cubic Bezier Curve
csukuangfj/kfj-vim
My vim settings.
csukuangfj/piper-phonemize
C++ library for converting text to phonemes for Piper
Jarellin/HITszDailyHealth
哈工大深圳每日健康上报
yilinyl/hybrid_nmm
A hybrid framework (neural mass model + ML) for SC-to-FC prediction