Pinned Repositories
2prime.github.io
Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes
2s-AGCN
Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition in CVPR19
3D-Human-Body-Shape
3d-pose-baseline
A simple baseline for 3d human pose estimation in tensorflow. Presented at ICCV 17.
action-detection
temporal action detection with SSN
dense_flow
Tools to extract dense optical flow from videos, based on OpenCV
repulsion_loss_ssd
Repulsion Loss: Detecting Pedestrians in a Crowd. https://arxiv.org/abs/1711.07752
TrajectoryNet-1
hzhang57's Repositories
hzhang57/hzhang57.github.io
hzhang57/2prime.github.io
Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes
hzhang57/Awesome-CLIP
Awesome list for research on CLIP (Contrastive Language-Image Pre-Training).
hzhang57/awesome-vision-language-pretraining-papers
Recent Advances in Vision and Language PreTrained Models (VL-PTMs)
hzhang57/behave-dataset
code to access BEHAVE dataset
hzhang57/chain-of-thought-hub
Benchmarking large language models' complex reasoning ability with chain-of-thought prompting
hzhang57/CLIP
Contrastive Language-Image Pretraining
hzhang57/CogVideo
Text-to-video generation.
hzhang57/coyo-dataset
COYO-700M: Large-scale Image-Text Pair Dataset
hzhang57/GLIP
Grounded Language-Image Pre-training
hzhang57/Group-Contextualization
[CVPR22] Group Contextualization for Video Recognition
hzhang57/GSS
[CVPR 2023] Official repository of Generative Semantic Segmentation
hzhang57/HowToLiveLonger
程序员延寿指南 | A programmer's guide to live longer
hzhang57/LaViLa
Code release for "Learning Video Representations from Large Language Models"
hzhang57/lightning-sam
Fine-tune Segment-Anything Model with Lightning Fabric.
hzhang57/Mask2Former
Code release for "Masked-attention Mask Transformer for Universal Image Segmentation"
hzhang57/mega
Sequence modeling with Mega.
hzhang57/METER
METER: A Multimodal End-to-end TransformER Framework
hzhang57/multimodal-maestro
Effective prompting for Large Multimodal Models like GPT-4 Vision or LLaVA. 🔥
hzhang57/Neighborhood-Attention-Transformer
[Preprint] Neighborhood Attention Transformer, 2022
hzhang57/openai-cookbook
Examples and guides for using the OpenAI API
hzhang57/Paper-Implementation-Template
A simple reproducible template to implement AI research papers
hzhang57/Pointcept
Pointcept: a codebase for point cloud perception research. Latest works: MSC, CeCo (CVPR 2023)
hzhang57/pytorch_scatter
PyTorch Extension Library of Optimized Scatter Operations
hzhang57/qna
[CVPR2022 - Oral] Official Jax Implementation of Learned Queries for Efficient Local Attention
hzhang57/SimCLR
PyTorch implementation of SimCLR: A Simple Framework for Contrastive Learning of Visual Representations by T. Chen et al.
hzhang57/VideoMAE
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
hzhang57/vidt
hzhang57/VILA
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
hzhang57/X-Decoder
Official Implementation of X-Decoder for generalized decoding for pixel, image and language