Qoboty's Stars
janhq/jan
Jan is an open source alternative to ChatGPT that runs 100% offline on your computer. Multiple engine support (llama.cpp, TensorRT-LLM)
unslothai/unsloth
Finetune Llama 3.2, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 80% less memory
exo-explore/exo
Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚
geekyutao/Inpaint-Anything
Inpaint anything using Segment Anything and inpainting models.
FunAudioLLM/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
google/gemma.cpp
lightweight, standalone C++ inference engine for Google's Gemma models.
allenai/OLMo
Modeling, training, eval, and inference code for OLMo
GuijiAI/duix.ai
snakers4/silero-vad
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
modelscope/ms-swift
Use PEFT or Full-parameter to finetune 400+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)
collabora/WhisperSpeech
An Open Source text-to-speech system built by inverting Whisper.
Kwai-Kolors/Kolors
Kolors Team
AnswerDotAI/gpu.cpp
A lightweight library for portable low-level GPU computation using WebGPU.
karpathy/build-nanogpt
Video+code lecture on building nanoGPT from scratch
pipecat-ai/pipecat
Open Source framework for voice and multimodal conversational AI
cambrian-mllm/cambrian
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
fixie-ai/ultravox
A fast multimodal LLM for real-time voice
karpathy/nano-llama31
nanoGPT style version of Llama 3.1
QwenLM/Qwen2-Audio
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
facebookresearch/MobileLLM
MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.
mlfoundations/dclm
DataComp for Language Models
bytedance/SALMONN
SALMONN: Speech Audio Language Music Open Neural Network
showlab/Show-o
Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
ricky0123/vad
Voice activity detector (VAD) for the browser with a simple API
multimodal-art-projection/MAP-NEO
jishengpeng/WavTokenizer
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
liutaocode/TTS-arxiv-daily
Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)
Pints-AI/1.5-Pints
A compact LLM pretrained in 9 days by using high quality data
homebrewltd/llama3-s
Llama3.1 learns to Listen
frankyoujian/Edge-Punct-Casing