youthHan

Ph.D. candidate

University of Technology SydneySydney

youthHan's Stars

google-deepmind/deepmind-research
This repository contains implementations and illustrative code to accompany DeepMind publications
Language:Jupyter Notebook13.3k 325 3242.6k
Mooler0410/LLMsPracticalGuide
A curated list of practical guide resources of LLMs (LLMs Tree, Examples, Papers)
9.5k 188 17728
lllyasviel/Omost
Your image is almost there!
Language:Python7.3k 45 81422
Doubiiu/ToonCrafter
[SIGGRAPH Asia 2024, Journal Track] ToonCrafter: Generative Cartoon Interpolation
Language:Python5.4k 60 57451
UX-Decoder/Segment-Everything-Everywhere-All-At-Once
[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"
Language:Python4.4k 59 149407
FoundationVision/VAR
[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
Language:Python4.3k 116 83316
showlab/Awesome-Video-Diffusion
A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.
3.5k 138 20203
LLaVA-VL/LLaVA-NeXT
Language:Python2.9k 35 309253
Alpha-VLLM/Lumina-T2X
Lumina-T2X is a unified framework for Text to Any Modality Generation
Language:Python2.1k 31 8788
intel/intel-extension-for-pytorch
A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
Language:Python1.6k 39 552250
xiaobai1217/Awesome-Video-Datasets
Video datasets
1.2k 28 1295
kadirnar/segment-anything-video
MetaSeg: Packaged version of the Segment Anything repository
Language:Python958 12 4567
krantiparida/awesome-audio-visual
A curated list of different papers and datasets in various areas of audio-visual processing
671 18 268
OpenRobotLab/EmbodiedScan
[CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
Language:Python494 7 6737
hehao13/CameraCtrl
Language:Python435 12 1719
AILab-CVC/SEED-X
Multimodal Models in Real World
Language:Jupyter Notebook405 19 2817
HL-hanlin/Ctrl-Adapter
Official implementation of Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model
Language:Python391 21 2516
simpler-env/SimplerEnv
Evaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo) in simulation under common setups (e.g., Google Robot, WidowX+Bridge) (CoRL 2024)
Language:Jupyter Notebook335 6 4544
OpenGVLab/unmasked_teacher
[ICCV2023 Oral] Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Language:Python298 12 4816
lingorX/HieraSeg
CVPR2022 - Deep Hierarchical Semantic Segmentation - A structured, pixel-wise description of visual scenes in terms of the class hierarchy.
Language:Python266 9 1425
ZHU-Zhiyu/NVS_Solver
Source code of paper "NVS-Solver: Video Diffusion Model as Zero-Shot Novel View Synthesizer"
Language:Python257 14 287
zju3dv/Coin3D
[SIGGRAPH 2024] Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided Conditioning
161 30 30
bytedance/Shot2Story
A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.
Language:Python99 6 176
GengzeZhou/NavGPT-2
[ECCV 2024] Official implementation of NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models
Language:Python82 7 51
ziplab/LongVLM
Language:Python65 4 75
PingchuanMa/SGA
[ICML 2024] LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery
Language:Python57 3 16
amazon-science/indoor-scene-generation-eai
Language:Jupyter Notebook53 3 49
GR1-Manipulation/GR-1
Code for "Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation"
43 9 20
bytedance/Portrait-Mode-Video
Video dataset dedicated to portrait-mode video recognition.
Language:Python38 4 41
facebookresearch/BioSkin
Inference of biophysical skin properties from RGB reflectance, with spectral upsampling from 380 to 1000 nm. An interactive viewer and editor is provided, alongside several practical applications.
Language:Python11 2 2

youthHan

youthHan's Stars

google-deepmind/deepmind-research

Mooler0410/LLMsPracticalGuide

lllyasviel/Omost

Doubiiu/ToonCrafter

UX-Decoder/Segment-Everything-Everywhere-All-At-Once

FoundationVision/VAR

showlab/Awesome-Video-Diffusion

LLaVA-VL/LLaVA-NeXT

Alpha-VLLM/Lumina-T2X

intel/intel-extension-for-pytorch

xiaobai1217/Awesome-Video-Datasets

kadirnar/segment-anything-video

krantiparida/awesome-audio-visual

OpenRobotLab/EmbodiedScan

hehao13/CameraCtrl

AILab-CVC/SEED-X

HL-hanlin/Ctrl-Adapter

simpler-env/SimplerEnv

OpenGVLab/unmasked_teacher

lingorX/HieraSeg

ZHU-Zhiyu/NVS_Solver

zju3dv/Coin3D

bytedance/Shot2Story

GengzeZhou/NavGPT-2

ziplab/LongVLM

PingchuanMa/SGA

amazon-science/indoor-scene-generation-eai

GR1-Manipulation/GR-1

bytedance/Portrait-Mode-Video

facebookresearch/BioSkin