felixfuu's Stars
hiyouga/LLaMA-Factory
Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
rasbt/LLMs-from-scratch
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
facebookresearch/segment-anything-2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
lllyasviel/Omost
Your image is almost there!
luosiallen/latent-consistency-model
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
LLaVA-VL/LLaVA-NeXT
Alpha-VLLM/Lumina-T2X
Lumina-T2X is a unified framework for Text to Any Modality Generation
aigc-apps/EasyAnimate
📺 An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion
dvlab-research/ControlNeXt
Controllable video and image Generation, SVD, Animate Anyone, ControlNet, ControlNeXt, LoRA
FoundationVision/LlamaGen
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
omerbt/MultiDiffusion
Official Pytorch Implementation for "MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation" presenting "MultiDiffusion" (ICML 2023)
Zheng-Chong/CatVTON
CatVTON is a simple and efficient virtual try-on diffusion model with 1) Lightweight Network (899.06M parameters totally), 2) Parameter-Efficient Training (49.57M parameters trainable) and 3) Simplified Inference (< 8G VRAM for 1024X768 resolution).
HarborYuan/ovsam
[ECCV 2024] The official code of paper "Open-Vocabulary SAM".
OpenGVLab/VisionLLM
VisionLLM Series
megvii-research/megactor
mlfoundations/MINT-1T
MINT-1T: A one trillion token multimodal interleaved dataset.
mit-han-lab/fastcomposer
[IJCV] FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention
LeapLabTHU/Agent-Attention
Official repository of Agent Attention (ECCV2024)
IDEA-Research/X-Pose
[ECCV 2024] Official implementation of the paper "X-Pose: Detecting Any Keypoints"
AIGText/Glyph-ByT5
[ECCV2024] This is an official inference code of the paper "Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering" and "Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering""
Kobaayyy/Awesome-CVPR2024-ECCV2024-AIGC
A Collection of Papers and Codes for CVPR2024/ECCV2024 AIGC
OPPO-Mente-Lab/Subject-Diffusion
Subject-Diffusion:Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning
aim-uofa/MovieDreamer
zamling/PSALM
[ECCV2024] This is an official implementation for "PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model"
baaivision/DenseFusion
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
TIGER-AI-Lab/UniIR
Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)
callsys/ControlCap
[ECCV 2024] ControlCap: Controllable Region-level Captioning
lorebianchi98/FG-OVD
[CVPR2024 Highlight] Official repository of the paper "The devil is in the fine-grained details: Evaluating open-vocabulary object detectors for fine-grained understanding."
pasqualedem/LabelAnything
Multi-Class Few-Shot Semantic Segmentation with Visual Prompts
byeongjun-park/Switch-DiT
[ECCV 2024] Official pytorch implementation of "Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts"