Sarah-Xing

Speech recognition, Speaker recognition, Speech Diarization, Machine Learning, LLM

Sarah-Xing's Stars

sanowl/LSLM-Listening-while-Speaking-Language-Model
LSLM implements full duplex modeling in interactive speech language models, based on research by Ma et al. (2024). This project advances human-computer interaction through real-time spoken dialogue capabilities.
Language:Python566
eriklindernoren/PyTorch-GAN
PyTorch implementations of Generative Adversarial Networks.
Language:Python16.6k4.1k
chaklam-silpasuwanchai/Python-fo-Natural-Language-Processing
This is the repository for the course Natural Language Processing at Asian Institute of Technology. Covers word vectors, spaCy, PyTorch, HuggingFace.
Language:Jupyter Notebook6227
Jackson-Kang/Pytorch-VAE-tutorial
A simple tutorial of Variational AutoEncoders with Pytorch
Language:Jupyter Notebook34577
openai/jukebox
Code for the paper "Jukebox: A Generative Model for Music"
Language:Python7.9k1.4k
facebookresearch/speech-resynthesis
An official reimplementation of the method described in the INTERSPEECH 2021 paper - Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.
Language:Python39456
kyegomez/MELLE
An open source community implementation of the model MELLE from the paper: "Autoregressive Speech Synthesis without Vector Quantization"
Language:Shell6
ZhangXInFD/SpeechTokenizer
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
Language:Python50745
yukara-ikemiya/wavefit-pytorch
PyTorch implementation of WaveFit [2022, Google] which is one of SOTA lightweight/fast speech vocoders.
Language:Python493
Barty-Bart/openai-realtime-api-voice-assistant
Language:JavaScript8777
rishikksh20/SoundStorm-pytorch
Google's SoundStorm: Efficient Parallel Audio Generation
Language:Python12913
lucidrains/vector-quantize-pytorch
Vector (and Scalar) Quantization, in Pytorch
Language:Python2.8k223
google/lyra
A Very Low-Bitrate Codec for Speech Compression
Language:C++3.8k356
wesbz/SoundStream
This repository is an implementation of this article: https://arxiv.org/pdf/2107.03312.pdf
Language:Python36152
yangdongchao/SoundStorm
The reproduced code for Google's SoundStorm
Language:Python26119
0nutation/USLM
Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)
Language:Python13910
LinkSoul-AI/LLaSM
第一个支持中英文双语语音-文本多模态对话的开源可商用对话模型。便捷的语音输入将大幅改善以文本为输入的大模型的使用体验，同时避免了基于 ASR 解决方案的繁琐流程以及可能引入的错误。
Language:Python54155
descriptinc/descript-audio-codec
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
Language:Python1.2k117
fixie-ai/ai-benchmarks
Benchmarking suite for popular AI APIs
Language:Python7814
fixie-ai/ultravox
A fast multimodal LLM for real-time voice
Language:Python1.7k116
lucidrains/soundstorm-pytorch
Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch
Language:Python1.4k90
shreyabhadwal/AI-Receptionist
Book appointments, record messages, get information and much more via voice through Pam AI, an Auto-GPT like AI receptionist.
Language:Python16
kaymen99/AI-Voice-assistant
AI Voice Assistant: talk to an AI agent that handles event scheduling, managing contacts, accessing your knowledge base and web searching through simple voice commands.
Language:Python227
twilio-labs/call-gpt
Generative AI phone call toolkit using Twilio Media Streams.
Language:JavaScript345145
voidful/Codec-SUPERB
Audio Codec Speech processing Universal PERformance Benchmark
Language:Python23622
ga642381/speech-trident
Awesome speech/audio LLMs, representation learning, and codec models
79548
kyegomez/USM
Implementation of Google's USM speech model in Pytorch
Language:Python264
kyegomez/Gemini
The open source implementation of Gemini, the model that will "eclipse ChatGPT" by Google
Language:Python43556
kyutai-labs/moshi
Language:Python7k550
NirDiamant/RAG_Techniques
This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. RAG systems combine information retrieval with generative models to provide accurate and contextually rich responses.
Language:Jupyter Notebook9.4k960

Sarah-Xing

Sarah-Xing's Stars

sanowl/LSLM-Listening-while-Speaking-Language-Model

eriklindernoren/PyTorch-GAN

chaklam-silpasuwanchai/Python-fo-Natural-Language-Processing

Jackson-Kang/Pytorch-VAE-tutorial

openai/jukebox

facebookresearch/speech-resynthesis

kyegomez/MELLE

ZhangXInFD/SpeechTokenizer

yukara-ikemiya/wavefit-pytorch

Barty-Bart/openai-realtime-api-voice-assistant

rishikksh20/SoundStorm-pytorch

lucidrains/vector-quantize-pytorch

google/lyra

wesbz/SoundStream

yangdongchao/SoundStorm

0nutation/USLM

LinkSoul-AI/LLaSM

descriptinc/descript-audio-codec

fixie-ai/ai-benchmarks

fixie-ai/ultravox

lucidrains/soundstorm-pytorch

shreyabhadwal/AI-Receptionist

kaymen99/AI-Voice-assistant

twilio-labs/call-gpt

voidful/Codec-SUPERB

ga642381/speech-trident

kyegomez/USM

kyegomez/Gemini

kyutai-labs/moshi

NirDiamant/RAG_Techniques