youthHan's Stars
google-deepmind/deepmind-research
This repository contains implementations and illustrative code to accompany DeepMind publications
Mooler0410/LLMsPracticalGuide
A curated list of practical guide resources of LLMs (LLMs Tree, Examples, Papers)
lllyasviel/Omost
Your image is almost there!
Doubiiu/ToonCrafter
[SIGGRAPH Asia 2024, Journal Track] ToonCrafter: Generative Cartoon Interpolation
UX-Decoder/Segment-Everything-Everywhere-All-At-Once
[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"
FoundationVision/VAR
[NeurIPS 2024 Oral][GPT beats diffusionš„] [scaling laws in visual generationš] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
showlab/Awesome-Video-Diffusion
A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.
LLaVA-VL/LLaVA-NeXT
Alpha-VLLM/Lumina-T2X
Lumina-T2X is a unified framework for Text to Any Modality Generation
intel/intel-extension-for-pytorch
A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
xiaobai1217/Awesome-Video-Datasets
Video datasets
kadirnar/segment-anything-video
MetaSeg: Packaged version of the Segment Anything repository
krantiparida/awesome-audio-visual
A curated list of different papers and datasets in various areas of audio-visual processing
OpenRobotLab/EmbodiedScan
[CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
hehao13/CameraCtrl
AILab-CVC/SEED-X
Multimodal Models in Real World
HL-hanlin/Ctrl-Adapter
Official implementation of Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model
simpler-env/SimplerEnv
Evaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo) in simulation under common setups (e.g., Google Robot, WidowX+Bridge) (CoRL 2024)
OpenGVLab/unmasked_teacher
[ICCV2023 Oral] Unmasked Teacher: Towards Training-Efficient Video Foundation Models
lingorX/HieraSeg
CVPR2022 - Deep Hierarchical Semantic Segmentation - A structured, pixel-wise description of visual scenes in terms of the class hierarchy.
ZHU-Zhiyu/NVS_Solver
Source code of paper "NVS-Solver: Video Diffusion Model as Zero-Shot Novel View Synthesizer"
zju3dv/Coin3D
[SIGGRAPH 2024] Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided Conditioning
bytedance/Shot2Story
A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.
GengzeZhou/NavGPT-2
[ECCV 2024] Official implementation of NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models
ziplab/LongVLM
PingchuanMa/SGA
[ICML 2024] LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery
amazon-science/indoor-scene-generation-eai
GR1-Manipulation/GR-1
Code for "Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation"
bytedance/Portrait-Mode-Video
Video dataset dedicated to portrait-mode video recognition.
facebookresearch/BioSkin
Inference of biophysical skin properties from RGB reflectance, with spectral upsampling from 380 to 1000 nm. An interactive viewer and editor is provided, alongside several practical applications.