alanbreeze's Stars
HIT-SCIR/huozi
活字通用大模型
dair-ai/Prompt-Engineering-Guide
🐙 Guides, papers, lecture, notebooks and resources for prompt engineering
yunwei37/Prompt-Engineering-Guide-zh-CN
🐙 关于提示词工程(prompt)的指南、论文、讲座、笔记本和资源大全(自动持续更新)
allenai/OLMo
Modeling, training, eval, and inference code for OLMo
microsoft/Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
Lightning-AI/litgpt
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
0nutation/SpeechGPT
SpeechGPT Series: Speech Large Language Models
gpt-omni/mini-omni
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
donnemartin/system-design-primer
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
EvelynFan/FaceFormer
[CVPR 2022] FaceFormer: Speech-Driven 3D Facial Animation with Transformers
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
ga642381/speech-trident
Awesome speech/audio LLMs, representation learning, and codec models
collabora/WhisperSpeech
An Open Source text-to-speech system built by inverting Whisper.
facebookresearch/chameleon
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
enhuiz/vall-e
An unofficial PyTorch implementation of the audio LM VALL-E
lifeiteng/vall-e
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
mlfoundations/open_flamingo
An open-source framework for training large multimodal models.
LSimon95/megatts2
Unoffical implementation of Megatts2
lucidrains/vector-quantize-pytorch
Vector (and Scalar) Quantization, in Pytorch
2noise/ChatTTS
A generative speech model for daily dialogue.
geekan/MetaGPT
🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming
d2l-ai/d2l-en
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
MasayaKawamura/MB-iSTFT-VITS
Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform
FlagOpen/FlagEmbedding
Retrieval and Retrieval-augmented LLMs
modelscope/FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
huggingface/distil-whisper
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
xai-org/grok-1
Grok open release
RVC-Boss/GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
microsoft/JARVIS
JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf