Pinned Repositories
anything-llm
A multi-user ChatGPT for any LLMs and vector database. Unlimited documents, messages, and storage in one privacy-focused app. Now available as a desktop application!
AVT
Code release for ICCV 2021 paper "Anticipative Video Transformer"
ChatTTS
A generative speech model for daily dialogue.
CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Diffree
Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model
mysqlc
mysql c example
segment-anything-2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
udptunnel-1.1
udptunnel-1.1
UDT
UDT: Breaking the Data Transfer Bottleneck UDT is a reliable UDP based application level data transport protocol for distributed data intensive applications over wide area high-speed networks. UDT uses UDP to transfer bulk data with its own reliability control and congestion control mechanisms. The new protocol can transfer data at a much higher speed than TCP does. UDT is also a highly configurable framework that can accommodate various congestion control algorithms.
UniTalker
gary109's Repositories
gary109/StreamMultiDiffusion
Official code for the paper "StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control."
gary109/agentscope
AgentScope: A Flexible yet Robust Multi-Agent Platform
gary109/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
gary109/AniPortrait
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
gary109/AnyTool
gary109/ATLAS
A principled instruction benchmark on formulating effective queries and prompts for large language models (LLMs). Our paper: https://arxiv.org/abs/2312.16171
gary109/AudioGPT
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
gary109/c3po
gary109/ChatDev
Create Customized Software using Natural Language Idea (through LLM-powered Multi-Agent Collaboration)
gary109/DataDreamer
DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. 🤖💤
gary109/distrifuser
[CVPR 2024] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
gary109/DiT
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
gary109/EMO
gary109/ESP32-targz
🗜️ An Arduino library to unpack/uncompress tar, gz, and tar.gz files on ESP32 and ESP8266
gary109/FiT
FiT: Flexible Vision Transformer for Diffusion Model
gary109/GenAI-Hw5
repo of Introduction to GenAI Hw5
gary109/GenAI_hw6_dataset
gary109/LLM-Agent-Paper-List
The paper list of the 86-page paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et al.
gary109/ML-Papers-of-the-Week
🔥Highlighting the top ML papers every week.
gary109/mtg-jamendo-dataset
Metadata, scripts and baselines for the MTG-Jamendo dataset
gary109/MU-LLaMA
MU-LLaMA: Music Understanding Large Language Model
gary109/OOTDiffusion
Official implementation of OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on
gary109/Prompt-Engineering-Guide
🐙 Guides, papers, lecture, notebooks and resources for prompt engineering
gary109/SALMONN
SALMONN: Speech Audio Language Music Open Neural Network
gary109/Seeing-and-Hearing
[CVPR 2024] Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners
gary109/songcomposer
gary109/SoraReview
The official GitHub page for the review paper "Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models".
gary109/subobjects
Official repository of paper "Subobject-level Image Tokenization"
gary109/symusic
A cross platform note level midi decoding library with lightening speed, based on minimidi.
gary109/yolov9
Implementation of paper - YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information