Boxie5's Stars
oobabooga/text-generation-webui
A Gradio web UI for Large Language Models.
TencentARC/GFPGAN
GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration.
suno-ai/bark
🔊 Text-Prompted Generative Audio Model
microsoft/AI-For-Beginners
12 Weeks, 24 Lessons, AI for All!
coqui-ai/TTS
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
RVC-Boss/GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
shadowsocks/ShadowsocksX-NG
Next Generation of ShadowsocksX
microsoft/autogen
A programming framework for agentic AI 🤖
jamiebuilds/the-super-tiny-compiler
:snowman: Possibly the smallest compiler ever
deezer/spleeter
Deezer source separation library including pretrained models.
huggingface/datasets
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
state-spaces/mamba
Mamba SSM architecture
OpenTalker/SadTalker
[CVPR 2023] SadTalker:Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
vanna-ai/vanna
🤖 Chat with your SQL database 📊. Accurate Text-to-SQL Generation via LLMs using RAG 🔄.
chenzomi12/AISystem
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
AIGC-Audio/AudioGPT
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
numba/numba
NumPy aware dynamic Python compiler using LLVM
xszyou/Fay
Fay is an open-source digital human framework integrating language models and digital characters. It offers retail, assistant, and agent versions for diverse applications like virtual shopping guides, broadcasters, assistants, waiters, teachers, and voice or text-based mobile assistants.
OthersideAI/self-operating-computer
A framework to enable multimodal models to operate a computer.
fishaudio/Bert-VITS2
vits2 backbone with multilingual-bert
PKU-YuanGroup/ChatLaw
ChatLaw:A Powerful LLM Tailored for Chinese Legal. 中文法律大模型
XPixelGroup/BasicSR
Open Source Image and Video Restoration Toolbox for Super-resolution, Denoise, Deblurring, etc. Currently, it includes EDSR, RCAN, SRResNet, SRGAN, ESRGAN, EDVR, BasicVSR, SwinIR, ECBSR, etc. Also support StyleGAN2, DFDNet.
OpenTalker/video-retalking
[SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild
MineDojo/Voyager
An Open-Ended Embodied Agent with Large Language Models
microsoft/LLMLingua
To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
tensorflow/datasets
TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
huggingface/safetensors
Simple, safe way to store and distribute tensors
alitto/pond
🔘 Minimalistic and High-performance goroutine worker pool written in Go
jianchang512/vocal-separate
an extremely simple tool for separating vocals and background music, completely localized for web operation, using 2stems/4stems/5stems models 这是一个极简的人声和背景音乐分离工具,本地化网页操作,无需连接外网
sustcsonglin/flash-linear-attention
Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton