PanXiebit's Stars
black-forest-labs/flux
Official inference repo for FLUX.1 models
gpt-omni/mini-omni
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
modelscope/data-juicer
A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
boyu-ai/Hands-on-RL
https://hrl.boyuai.com/
THUDM/CogVLM2
GPT4V-level open-source multi-modal model based on Llama3-8B
facebookresearch/chameleon
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
aigc-apps/EasyAnimate
📺 An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion
jeanfeydy/geomloss
Geometric loss functions between point clouds, images and volumes
catcathh/UltraPixel
Implementation of UltraPixel: Advancing Ultra-High-Resolution Image Synthesis to New Peaks
buoyancy99/diffusion-forcing
code for "Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion"
Alpha-VLLM/Lumina-mGPT
Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining"
Vchitect/VEnhancer
Official codes of VEnhancer: Generative Space-Time Enhancement for Video Generation
G-U-N/Phased-Consistency-Model
Boosting the performance of consistency models with PCM!
jannerm/ddpo
Code for the paper "Training Diffusion Models with Reinforcement Learning"
NVlabs/I2SB
aim-uofa/MovieDreamer
Huage001/LinFusion
Official PyTorch and Diffusers Implementation of "LinFusion: 1 GPU, 1 Minute, 16K Image"
RLHF-V/RLAIF-V
RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness
locuslab/ect
Consistency Models Made Easy
yk7333/d3po
[CVPR 2024] Code for the paper "Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model"
Yuanshi9815/Video-Infinity
Video-Infinity generates long videos quickly using multiple GPUs without extra training.
meder411/PyTorch-EMDLoss
PyTorch 1.0 implementation of the approximate Earth Mover's Distance
thu-ml/Bridge-TTS
Official codebase for "Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis" (https://arxiv.org/abs/2312.03491).
chen-wl20/DreamCinema
DreamCinema: Cinematic Transfer with Free Camera and 3D Character
ZebangCheng/Emotion-LLaMA
Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning
LituRout/OptimalTransportModeling
The repository contains reproducible PyTorch source code of our paper Generative Modeling with Optimal Transport Maps, ICLR 2022.
XJTU-XGU/OTCS
Code for "Optimal Transport-Guided Conditional Score-Based Diffusion Model (NeurIPS, 8,7,7,6)"
AI-Study-Han/Zero-Qwen-VL
训练一个对中文支持更好的LLaVA模型,并开源训练代码和数据。
QUVA-Lab/SIGMA
SignDiff/Processed-Data
Preprocessed data of SignDiff: Learning Diffusion Models for American Sign Language Production