alanbreeze

alanbreeze's Stars

HIT-SCIR/huozi
活字通用大模型
Language:Python33022
dair-ai/Prompt-Engineering-Guide
🐙 Guides, papers, lecture, notebooks and resources for prompt engineering
Language:MDX47.7k4.7k
yunwei37/Prompt-Engineering-Guide-zh-CN
🐙 关于提示词工程（prompt）的指南、论文、讲座、笔记本和资源大全（自动持续更新）
Language:Jupyter Notebook44638
allenai/OLMo
Modeling, training, eval, and inference code for OLMo
Language:Python4.4k436
microsoft/Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
Language:Python1.8k337
Lightning-AI/litgpt
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
Language:Python9.8k977
0nutation/SpeechGPT
SpeechGPT Series: Speech Large Language Models
Language:Python1.2k81
gpt-omni/mini-omni
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
Language:Python2.4k240
donnemartin/system-design-primer
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
Language:Python269k45.5k
EvelynFan/FaceFormer
[CVPR 2022] FaceFormer: Speech-Driven 3D Facial Animation with Transformers
Language:Python782133
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python26.8k3.9k
sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
Language:Python5.1k358
ga642381/speech-trident
Awesome speech/audio LLMs, representation learning, and codec models
57826
collabora/WhisperSpeech
An Open Source text-to-speech system built by inverting Whisper.
Language:Jupyter Notebook3.8k204
facebookresearch/chameleon
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
Language:Python1.8k107
enhuiz/vall-e
An unofficial PyTorch implementation of the audio LM VALL-E
Language:Python2.9k417
lifeiteng/vall-e
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
Language:Python2k320
mlfoundations/open_flamingo
An open-source framework for training large multimodal models.
Language:Python3.7k277
LSimon95/megatts2
Unoffical implementation of Megatts2
Language:Python25534
lucidrains/vector-quantize-pytorch
Vector (and Scalar) Quantization, in Pytorch
Language:Python2.4k196
2noise/ChatTTS
A generative speech model for daily dialogue.
Language:Python30.7k3.3k
geekan/MetaGPT
🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming
Language:Python43.6k5.2k
d2l-ai/d2l-en
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
Language:Python23.2k4.3k
MasayaKawamura/MB-iSTFT-VITS
Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform
Language:Python41264
FlagOpen/FlagEmbedding
Retrieval and Retrieval-augmented LLMs
Language:Python6.8k491
modelscope/FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Language:Python5.9k643
huggingface/distil-whisper
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
Language:Python3.5k271
xai-org/grok-1
Grok open release
Language:Python49.4k8.3k
RVC-Boss/GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Language:Python32.5k3.7k
microsoft/JARVIS
JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf
Language:Python23.5k2k