Sarah-Xing's Stars
sanowl/LSLM-Listening-while-Speaking-Language-Model
LSLM implements full duplex modeling in interactive speech language models, based on research by Ma et al. (2024). This project advances human-computer interaction through real-time spoken dialogue capabilities.
eriklindernoren/PyTorch-GAN
PyTorch implementations of Generative Adversarial Networks.
chaklam-silpasuwanchai/Python-fo-Natural-Language-Processing
This is the repository for the course Natural Language Processing at Asian Institute of Technology. Covers word vectors, spaCy, PyTorch, HuggingFace.
Jackson-Kang/Pytorch-VAE-tutorial
A simple tutorial of Variational AutoEncoders with Pytorch
openai/jukebox
Code for the paper "Jukebox: A Generative Model for Music"
facebookresearch/speech-resynthesis
An official reimplementation of the method described in the INTERSPEECH 2021 paper - Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.
kyegomez/MELLE
An open source community implementation of the model MELLE from the paper: "Autoregressive Speech Synthesis without Vector Quantization"
ZhangXInFD/SpeechTokenizer
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
yukara-ikemiya/wavefit-pytorch
PyTorch implementation of WaveFit [2022, Google] which is one of SOTA lightweight/fast speech vocoders.
Barty-Bart/openai-realtime-api-voice-assistant
rishikksh20/SoundStorm-pytorch
Google's SoundStorm: Efficient Parallel Audio Generation
lucidrains/vector-quantize-pytorch
Vector (and Scalar) Quantization, in Pytorch
google/lyra
A Very Low-Bitrate Codec for Speech Compression
wesbz/SoundStream
This repository is an implementation of this article: https://arxiv.org/pdf/2107.03312.pdf
yangdongchao/SoundStorm
The reproduced code for Google's SoundStorm
0nutation/USLM
Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)
LinkSoul-AI/LLaSM
第一个支持中英文双语语音-文本多模态对话的开源可商用对话模型。便捷的语音输入将大幅改善以文本为输入的大模型的使用体验,同时避免了基于 ASR 解决方案的繁琐流程以及可能引入的错误。
descriptinc/descript-audio-codec
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
fixie-ai/ai-benchmarks
Benchmarking suite for popular AI APIs
fixie-ai/ultravox
A fast multimodal LLM for real-time voice
lucidrains/soundstorm-pytorch
Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch
shreyabhadwal/AI-Receptionist
Book appointments, record messages, get information and much more via voice through Pam AI, an Auto-GPT like AI receptionist.
kaymen99/AI-Voice-assistant
AI Voice Assistant: talk to an AI agent that handles event scheduling, managing contacts, accessing your knowledge base and web searching through simple voice commands.
twilio-labs/call-gpt
Generative AI phone call toolkit using Twilio Media Streams.
voidful/Codec-SUPERB
Audio Codec Speech processing Universal PERformance Benchmark
ga642381/speech-trident
Awesome speech/audio LLMs, representation learning, and codec models
kyegomez/USM
Implementation of Google's USM speech model in Pytorch
kyegomez/Gemini
The open source implementation of Gemini, the model that will "eclipse ChatGPT" by Google
kyutai-labs/moshi
NirDiamant/RAG_Techniques
This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. RAG systems combine information retrieval with generative models to provide accurate and contextually rich responses.