zqh0253's Stars
black-forest-labs/flux
Official inference repo for FLUX.1 models
lllyasviel/IC-Light
More relighting!
Doubiiu/ToonCrafter
[SIGGRAPH Asia 2024, Journal Track] ToonCrafter: Generative Cartoon Interpolation
luigifreda/pyslam
pySLAM contains a Visual Odometry (VO) pipeline in Python for monocular, stereo and RGBD cameras. It supports many modern local features based on Deep Learning.
baaivision/Emu3
Next-Token Prediction is All You Need
colmap/glomap
GLOMAP - Global Structured-from-Motion Revisited
YvanYin/Metric3D
The repo for "Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image" and "Metric3Dv2: A Versatile Monocular Geometric Foundation Model..."
naver/mast3r
Grounding Image Matching in 3D with MASt3R
FoundationVision/LlamaGen
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
menyifang/MIMO
Official implementation of "MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling"
showlab/Show-o
Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
facebookresearch/vggsfm
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
nerfstudio-project/viser
Web-based 3D visualization + Python
henry123-boy/SpaTracker
[CVPR 2024 Highlight] Official PyTorch implementation of SpatialTracker: Tracking Any 2D Pixels in 3D Space
facebookresearch/PoseDiffusion
[ICCV 2023] PoseDiffusion: Solving Pose Estimation via Diffusion-aided Bundle Adjustment
nianticlabs/acezero
[ECCV 2024 - Oral] ACE0 is a learning-based structure-from-motion approach that estimates camera parameters of sets of images by learning a multi-view consistent, implicit scene representation.
OpenRobotLab/GRUtopia
GRUtopia: Dream General Robots in a City at Scale
apple/ml-mdm
Train high-quality text-to-image diffusion models in a data & compute efficient manner
hehao13/CameraCtrl
OpenGVLab/OmniCorpus
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
facebookresearch/lightplane
Lightplane implements a highly memory-efficient differentiable radiance field renderer, and a module for unprojecting features from images to 3D grids.
cwchenwang/awesome-4d-generation
List of papers on 4D Generation.
zqh0253/3DitScene
3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting
galeselee/Awesome_LLM_System-PaperList
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on inference acceleration, and related works will be gradually added in the future. Welcome contributions!
hwanhuh/Radiance-Fields-from-VGGSfM-Mast3r
Gaussian Splatting from VGGSfM and Mast3r, and their comparison
tyhuang0428/DreamPhysics
DreamPhysics: Learning Physical Properties of Dynamic 3D Gaussians from Video Diffusion Priors
customdiffusion360/custom-diffusion360
CustomDiffusion360: Customizing Text-to-Image Diffusion with Camera Viewpoint Control
ubc-vision/vivid123
[CVPR 2024 Highlight] ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models
MegaScenes/dataset
jihaonew/UTA
Enhancing Vision-Language Model with Unmasked Token Alignment (TMLR)