Byron1201's Stars
ShusenTang/Dive-into-DL-PyTorch
本项目将《动手学深度学习》(Dive into Deep Learning)原书中的MXNet实现改为PyTorch实现。
IDEA-Research/Grounded-Segment-Anything
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
jacobgil/pytorch-grad-cam
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
IDEA-Research/GroundingDINO
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
facebookresearch/moco
PyTorch implementation of MoCo: https://arxiv.org/abs/1911.05722
UX-Decoder/Segment-Everything-Everywhere-All-At-Once
[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"
OpenGVLab/Ask-Anything
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
mit-han-lab/temporal-shift-module
[ICCV 2019] TSM: Temporal Shift Module for Efficient Video Understanding
yunlong10/Awesome-LLMs-for-Video-Understanding
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
Hello-SimpleAI/chatgpt-comparison-detection
Human ChatGPT Comparison Corpus (HC3), Detectors, and more! 🔥
fudan-zvg/SETR
[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
HarborYuan/ovsam
[ECCV 2024] The official code of paper "Open-Vocabulary SAM".
MCG-NJU/MixFormer
[CVPR 2022 Oral & TPAMI 2024] MixFormer: End-to-End Tracking with Iterative Mixed Attention
RetroCirce/HTS-Audio-Transformer
The official code repo of "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"
MengyangPu/EDTER
EDTER: Edge Detection with Transformer, in CVPR 2022
daishengdong/Games
games developed by python(五子棋、贪吃蛇、扫雷、俄罗斯方块、坦克大战、FlappyBird)
ys-zong/awesome-self-supervised-multimodal-learning
[T-PAMI] A curated list of self-supervised multimodal learning resources.
facebookresearch/AVT
Code release for ICCV 2021 paper "Anticipative Video Transformer"
rowanz/merlot_reserve
Code release for "MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound"
zinengtang/TVLT
PyTorch code for “TVLT: Textless Vision-Language Transformer” (NeurIPS 2022 Oral)
OpenGVLab/EgoVideo
[CVPR 2024 Champions] Solutions for EgoVis Chanllenges in CVPR 2024
ChinaYi/ASFormer
Official repo for BMVC2021 paper ASFormer: Transformer for action segmentation
YapengTian/AVVP-ECCV20
Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing, ECCV, 2020. (Spotlight)
OpenGVLab/EgoExoLearn
[CVPR 2024] Data and benchmark code for the EgoExoLearn dataset
Echo0125/MAT-Memory-and-Anticipation-Transformer
[ICCV 2023] Official implementation of Memory-and-Anticipation Transformer for Online Action Understanding
GenjiB/ECLIPSE
anpwu/ZJU-CS-ClassNotes
Chiaraplizz/ARGO1M-What-can-a-cook
WeiyanCai/EPnP_Python
bhwqy/pnp
My implementation of pnp problem including gauss newton, dlt and EPNP.