wangzheallen's Stars
instantX-research/InstantID
InstantID: Zero-shot Identity-Preserving Generation in Seconds 🔥
artidoro/qlora
QLoRA: Efficient Finetuning of Quantized LLMs
SJTU-IPADS/PowerInfer
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
leptonai/search_with_lepton
Building a quick conversation-based search demo with Lepton AI.
HumanAIGC/EMO
Emote Portrait Alive: Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
facebookresearch/DiT
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
gaomingqi/Track-Anything
Track-Anything is a flexible and interactive tool for video object tracking and segmentation, based on Segment Anything, XMem, and E2FGVI.
FoundationVision/VAR
[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
OpenGVLab/DragGAN
Unofficial Implementation of DragGAN - "Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold" (DragGAN 全功能实现,在线Demo,本地部署试用,代码、模型已全部开源,支持Windows, macOS, Linux)
allenai/OLMo
Modeling, training, eval, and inference code for OLMo
MarkFzp/mobile-aloha
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation
mlfoundations/open_flamingo
An open-source framework for training large multimodal models.
MooreThreads/Moore-AnimateAnyone
Character Animation (AnimateAnyone, Face Reenactment)
MarkFzp/act-plus-plus
Imitation learning algorithms with Co-training for Mobile ALOHA: ACT, Diffusion Policy, VINN
facebookresearch/jepa
PyTorch code and models for V-JEPA self-supervised learning from video.
isl-org/ZoeDepth
Metric depth estimation from a single image
siliconflow/onediff
OneDiff: An out-of-the-box acceleration library for diffusion models.
NVlabs/FoundationPose
[CVPR 2024 Highlight] FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects
AI-Hypercomputer/maxtext
A simple, performant and scalable Jax LLM!
rese1f/StableVideo
[ICCV 2023] StableVideo: Text-driven Consistency-aware Diffusion Video Editing
chuanyangjin/fast-DiT
Fast Diffusion Models with Transformers
OpenRobotLab/PointLLM
[ECCV 2024 Best Paper Candidate] PointLLM: Empowering Large Language Models to Understand Point Clouds
exiawsh/StreamPETR
[ICCV 2023] StreamPETR: Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection
csuhan/OneLLM
[CVPR 2024] OneLLM: One Framework to Align All Modalities with Language
tianweiy/DMD2
(NeurIPS 2024 Oral 🔥) Improved Distribution Matching Distillation for Fast Image Synthesis
DerryHub/BEVFormer_tensorrt
BEVFormer inference on TensorRT, including INT8 Quantization and Custom TensorRT Plugins (float/half/half2/int8).
OpenGVLab/PonderV2
PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm
Tsinghua-MARS-Lab/futr3d
Code for paper: FUTR3D: a unified sensor fusion framework for 3d detection
jiawei-ren/diffmimic
[ICLR 2023] DiffMimic: Efficient Motion Mimicking with Differentiable Physics https://arxiv.org/abs/2304.03274
DiT-3D/DiT-3D
🔥🔥🔥Official Codebase of "DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation"