gary109

Pinned Repositories

anything-llm
A multi-user ChatGPT for any LLMs and vector database. Unlimited documents, messages, and storage in one privacy-focused app. Now available as a desktop application!
Language:JavaScript1 1 00
AVT
Code release for ICCV 2021 paper "Anticipative Video Transformer"
Language:Python1 1 00
ChatTTS
A generative speech model for daily dialogue.
Language:Python0 0 00
CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Language:Python1 1 00
Diffree
Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model
Language:Python00
mysqlc
mysql c example
Language:C++5 3 03
segment-anything-2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Language:Jupyter Notebook0 0 00
udptunnel-1.1
udptunnel-1.1
Language:C3 3 02
UDT
UDT: Breaking the Data Transfer Bottleneck UDT is a reliable UDP based application level data transport protocol for distributed data intensive applications over wide area high-speed networks. UDT uses UDP to transfer bulk data with its own reliability control and congestion control mechanisms. The new protocol can transfer data at a much higher speed than TCP does. UDT is also a highly configurable framework that can accommodate various congestion control algorithms.
Language:C++4 4 02
UniTalker
Language:Python0 0 00

gary109's Repositories

gary109/StreamMultiDiffusion
Official code for the paper "StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control."
Language:Jupyter Notebook0 1 00
gary109/agentscope
AgentScope: A Flexible yet Robust Multi-Agent Platform
Language:Python1 0
gary109/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
Language:Python1 0
gary109/AniPortrait
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
Language:Python1 0
gary109/AnyTool
Language:Python1 0
gary109/ATLAS
A principled instruction benchmark on formulating effective queries and prompts for large language models (LLMs). Our paper: https://arxiv.org/abs/2312.16171
Language:Python1 0
gary109/AudioGPT
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
Language:Python0 0
gary109/c3po
Language:Python1 0
gary109/ChatDev
Create Customized Software using Natural Language Idea (through LLM-powered Multi-Agent Collaboration)
Language:Shell1 0
gary109/DataDreamer
DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. 🤖💤
gary109/distrifuser
[CVPR 2024] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
Language:Python1 0
gary109/DiT
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
Language:Python1 0
gary109/EMO
1 0
gary109/ESP32-targz
🗜️ An Arduino library to unpack/uncompress tar, gz, and tar.gz files on ESP32 and ESP8266
Language:C++0 0
gary109/FiT
FiT: Flexible Vision Transformer for Diffusion Model
gary109/GenAI-Hw5
repo of Introduction to GenAI Hw5
Language:Python1 0
gary109/GenAI_hw6_dataset
1 0
gary109/LLM-Agent-Paper-List
The paper list of the 86-page paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et al.
0 0
gary109/ML-Papers-of-the-Week
🔥Highlighting the top ML papers every week.
1 0
gary109/mtg-jamendo-dataset
Metadata, scripts and baselines for the MTG-Jamendo dataset
Language:Python0 0
gary109/MU-LLaMA
MU-LLaMA: Music Understanding Large Language Model
Language:Python0 0
gary109/OOTDiffusion
Official implementation of OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on
Language:Python1 0
gary109/Prompt-Engineering-Guide
🐙 Guides, papers, lecture, notebooks and resources for prompt engineering
Language:MDX1 0
gary109/SALMONN
SALMONN: Speech Audio Language Music Open Neural Network
Language:Python0 0
gary109/Seeing-and-Hearing
[CVPR 2024] Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners
1 0
gary109/songcomposer
Language:Python1 0
gary109/SoraReview
The official GitHub page for the review paper "Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models".
1 0
gary109/subobjects
Official repository of paper "Subobject-level Image Tokenization"
1 0
gary109/symusic
A cross platform note level midi decoding library with lightening speed, based on minimidi.
Language:C++1 0
gary109/yolov9
Implementation of paper - YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
Language:Python1 0