Pinned Repositories
dactory
delayed-streams-modeling
Kyutai's Speech-To-Text and Text-To-Speech models based on the Delayed Streams Modeling framework.
hibiki
Hibiki is a model for streaming speech translation (also known as simultaneous translation). Unlike offline translation—where one waits for the end of the source utterance to start translating--- Hibiki adapts its flow to accumulate just enough context to produce a correct translation in real-time, chunk by chunk.
moshi
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
moshi-finetune
moshi-swift
moshivis
Kyutai with an "eye"
nanoGPTaudio
Code for the blog "Neural audio codecs: how to get audio into LLMs"
sphn
python bindings for symphonia/opus - read various audio formats from python and write opus files
unmute
Make text LLMs listen and speak
kyutai's Repositories
kyutai-labs/moshi
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
kyutai-labs/delayed-streams-modeling
Kyutai's Speech-To-Text and Text-To-Speech models based on the Delayed Streams Modeling framework.
kyutai-labs/hibiki
Hibiki is a model for streaming speech translation (also known as simultaneous translation). Unlike offline translation—where one waits for the end of the source utterance to start translating--- Hibiki adapts its flow to accumulate just enough context to produce a correct translation in real-time, chunk by chunk.
kyutai-labs/unmute
Make text LLMs listen and speak
kyutai-labs/moshi-finetune
kyutai-labs/moshivis
Kyutai with an "eye"
kyutai-labs/nanoGPTaudio
Code for the blog "Neural audio codecs: how to get audio into LLMs"
kyutai-labs/moshi-swift
kyutai-labs/sphn
python bindings for symphonia/opus - read various audio formats from python and write opus files
kyutai-labs/dactory
kyutai-labs/yomikomi
A small rust-based data loader
kyutai-labs/kaudio
Rust crate for some audio utilities
kyutai-labs/ARC-Encoder
kyutai-labs/moshi-webrtc
Proof of concept for running moshi/hibiki using webrtc
kyutai-labs/tts_longeval
kyutai-labs/jax-flash-attn3
JAX bindings for the flash-attention3 kernels
kyutai-labs/jax-flash-attn2
JAX bindings for the flash-attention2 kernels
kyutai-labs/ogg-table
Ogg-vorbis reader with fast random access
kyutai-labs/dora
Dora is an experiment management framework. It expresses grid searches as pure python files as part of your repo. It identifies experiments with a unique hash signature. Scale up to hundreds of experiments without losing your sanity.
kyutai-labs/flashy
Framework for writing deep learning training loops. Lightweight, and retaining full freedom to design as you see fits. It handles checkpointing, logging, distributed, compatibility with Dora, and more!
kyutai-labs/neural-audio-codecs-anims
Animations for the blog "Neural audio codecs: how to get audio into LLMs"