iwaterxt's Stars
ytdl-org/youtube-dl
Command-line program to download videos from YouTube.com and other video sites
LAION-AI/Open-Assistant
OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
ggerganov/whisper.cpp
Port of OpenAI's Whisper model in C/C++
facebookresearch/audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
triton-lang/triton
Development repository for the Triton language and compiler
ggerganov/ggml
Tensor library for machine learning
lucidrains/PaLM-rlhf-pytorch
Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM
InternLM/InternLM
Official release of InternLM2.5 base and chat models. 1M context support
google/automl
Google Brain AutoML
deepjavalibrary/djl
An Engine-Agnostic Deep Learning Framework in Java
collabora/WhisperSpeech
An Open Source text-to-speech system built by inverting Whisper.
k2-fsa/sherpa-onnx
Speech-to-text, text-to-speech, speaker diarization, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter, Object Pascal, Lazarus, Rust
NExT-GPT/NExT-GPT
Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model
yerfor/GeneFace
GeneFace: Generalized and High-Fidelity 3D Talking Face Synthesis; ICLR 2023; Official code
haoheliu/AudioLDM2
Text-to-Audio/Music Generation
LCAV/pyroomacoustics
Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.
Voine/ChatWaifu_Mobile
移动版二次元 AI 老婆聊天器
k2-fsa/k2
FSA/FST algorithms, differentiable, with PyTorch compatibility.
sooftware/conformer
[Unofficial] PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)
SpeechColab/GigaSpeech
Large, modern dataset for speech recognition
Zhendong-Wang/Diffusion-GAN
Official PyTorch implementation for paper: Diffusion-GAN: Training GANs with Diffusion
facebookresearch/voxpopuli
A large-scale multilingual speech corpus for representation learning, semi-supervised learning and interpretation
yl4579/StyleTTS
Official Implementation of StyleTTS
microsoft/AEC-Challenge
AEC Challenge
fjiang9/NKF-AEC
Acoustic Echo Cancellation with Nerual Kalman Filtering
yl4579/PL-BERT
Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions
sarulab-speech/jtubespeech
echocatzh/MTFAA-Net
Multi-Scale Temporal Frequency Convolutional Network With Axial Attention for Speech Enhancement
jctian98/e2e_lfmmi
E2E system with LF-MMI; word N-gram for Mandarin
rrbluke/NRES
Neural Residual Echo Suppressor