ihollywhy's Stars
lllyasviel/Fooocus
Focus on prompting and generating
Stability-AI/generative-models
Generative Models by Stability AI
deepinsight/insightface
State-of-the-art 2D and 3D Face Analysis Project
hpcaitech/Open-Sora
Open-Sora: Democratizing Efficient Video Production for All
danielgatis/rembg
Rembg is a tool to remove images background
state-spaces/mamba
Mamba SSM architecture
facebookresearch/seamless_communication
Foundational Models for State-of-the-Art Speech and Text Translation
cumulo-autumn/StreamDiffusion
StreamDiffusion: A Pipeline-Level Solution for Real-Time Interactive Generation
LargeWorldModel/LWM
Large World Model -- Modeling Text and Video with Millions Context
OpenGVLab/InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
NVlabs/stylegan3
Official PyTorch implementation of StyleGAN3
THUDM/CogVLM
a state-of-the-art-level open visual language model | 多模态预训练模型
CompVis/taming-transformers
Taming Transformers for High-Resolution Image Synthesis
OpenGVLab/LLaMA-Adapter
[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
roboflow/notebooks
Examples and tutorials on using SOTA computer vision models and techniques. Learn everything from old-school ResNet, through YOLO and object-detection transformers like DETR, to the latest models like Grounding DINO and SAM.
AILab-CVC/YOLO-World
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
luosiallen/latent-consistency-model
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
ali-vilab/AnyDoor
Official implementations for paper: Anydoor: zero-shot object-level image customization
facebookresearch/co-tracker
CoTracker is a model for tracking any point (pixel) on a video.
PixArt-alpha/PixArt-alpha
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
Alpha-VLLM/LLaMA2-Accessory
An Open-source Toolkit for LLM Development
mit-han-lab/efficientvit
Efficient vision foundation models for high-resolution generation and perception.
NVlabs/VILA
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
intel/intel-extension-for-transformers
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
Computer-Vision-in-the-Wild/CVinW_Readings
A collection of papers on the topic of ``Computer Vision in the Wild (CVinW)''
ermongroup/SDEdit
PyTorch implementation for SDEdit: Image Synthesis and Editing with Stochastic Differential Equations
wpeebles/gangealing
Official PyTorch Implementation of "GAN-Supervised Dense Visual Alignment" (CVPR 2022 Oral, Best Paper Finalist)
OpenGVLab/VisionLLM
VisionLLM Series
horseee/DeepCache
[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free
YifanXu74/MQ-Det
Official PyTorch implementation of "Multi-modal Queried Object Detection in the Wild" (accepted by NeurIPS 2023)