hw-liang's Stars
lucidrains/vit-pytorch
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
HumanAIGC/AnimateAnyone
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation
khangich/machine-learning-interview
Machine Learning Interviews from FAANG, Snapchat, LinkedIn. I have offers from Snapchat, Coupang, Stitchfix etc. Blog: mlengineer.io.
lucidrains/denoising-diffusion-pytorch
Implementation of Denoising Diffusion Probabilistic Model in Pytorch
gaomingqi/Track-Anything
Track-Anything is a flexible and interactive tool for video object tracking and segmentation, based on Segment Anything, XMem, and E2FGVI.
itcharge/LeetCode-Py
⛽️「算法通关手册」:超详细的「算法与数据结构」基础讲解教程,从零基础开始学习算法知识,850+ 道「LeetCode 题目」详细解析,200 道「大厂面试热门题目」。
AILab-CVC/VideoCrafter
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
showlab/Awesome-Video-Diffusion
A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.
fundamentalvision/BEVFormer
[ECCV 2022] This is the official implementation of BEVFormer, a camera-only framework for autonomous driving perception, e.g., 3D object detection and semantic map segmentation.
aim-uofa/AdelaiDet
AdelaiDet is an open source toolbox for multiple instance-level detection and recognition tasks.
google-research/scenic
Scenic: A Jax Library for Computer Vision Research and Beyond
Doubiiu/DynamiCrafter
[ECCV 2024, Oral] DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
SUDO-AI-3D/zero123plus
Code repository for Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model.
xiaobai1217/Awesome-Video-Datasets
Video datasets
CLAY-3D/OpenCLAY
CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets
alibaba/animate-anything
Fine-Grained Open Domain Image Animation with Motion Guidance
WangYueFt/detr3d
jiawen-zhu/HQTrack
Tracking Anything in High Quality
bytedance/ibot
iBOT :robot:: Image BERT Pre-Training with Online Tokenizer (ICLR 2022)
katerakelly/pytorch-maml
PyTorch implementation of MAML: https://arxiv.org/abs/1703.03400
microsoft/esvit
EsViT: Efficient self-supervised Vision Transformers
VITA-Group/Diffusion4D
"Diffusion4D: Fast Spatial-temporal Consistent 4D Generation via Video Diffusion Models", Hanwen Liang*, Yuyang Yin*, Dejia Xu, Hanxue Liang, Zhangyang Wang, Konstantinos N. Plataniotis, Yao Zhao, Yunchao Wei
junchen14/Multi-Modal-Transformer
The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and self-supervised learning models. Additionally, it also collects many useful tutorials and tools in these related domains.
mzhaoshuai/CenterCLIP
[SIGIR 2022] CenterCLIP: Token Clustering for Efficient Text-Video Retrieval. Also, a text-video retrieval toolbox based on CLIP + fast pyav video decoding.
layer6ai-labs/xpool
https://layer6ai-labs.github.io/xpool/
TengdaHan/TemporalAlignNet
[CVPR'22 Oral] Temporal Alignment Networks for Long-term Video. Tengda Han, Weidi Xie, Andrew Zisserman.
sail-sg/mugs
A PyTorch implementation of Mugs proposed by our paper "Mugs: A Multi-Granular Self-Supervised Learning Framework".
VITA-Group/Comp4D
"Comp4D: Compositional 4D Scene Generation", Dejia Xu*, Hanwen Liang*, Neel P. Bhatt, Hezhen Hu, Hanxue Liang, Konstantinos N. Plataniotis, and Zhangyang Wang
princeton-nlp/DataMUX
[NeurIPS 2022] DataMUX: Data Multiplexing for Neural Networks
elisakreiss/concadia