Pinned Repositories
RACCooN
(arXiv.2405.18406) RACCooN: A Versatile Instructional Video Editing Framework with Auto-Generated Narratives
8086
Homework Codes in 8086 (Assembly Language) | HW from COA
AIART
AIART_Website
an image style translatiton website
CREMA
☕️ CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
IVA-0
[MM24] Zero-Shot Controllable Image-to-Video Animation via Motion Decomposition
MoPRL
[TCSVT] Regularity Learning via Explicit Distribution Modeling for Skeletal Video Anomaly Detection
SeViLA
[NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering
SJTU_SE_Groupwork
宿舍二手商品交易小组
VideoTree
Code for paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"
Yui010206's Repositories
Yui010206/SeViLA
[NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering
Yui010206/CREMA
☕️ CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
Yui010206/MoPRL
[TCSVT] Regularity Learning via Explicit Distribution Modeling for Skeletal Video Anomaly Detection
Yui010206/IVA-0
[MM24] Zero-Shot Controllable Image-to-Video Animation via Motion Decomposition
Yui010206/AlphaPose
Real-Time and Accurate Full-Body Multi-Person Pose Estimation&Tracking System
Yui010206/arunmallya.github.io
my public website
Yui010206/awesome-anomaly-detection
A curated list of awesome anomaly detection resources
Yui010206/awesome-vln
A curated list of research papers in Vision-Language Navigation (VLN)
Yui010206/detectron2
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
Yui010206/grid-feats-vqa
Grid features pre-training code for visual question answering
Yui010206/HOI-Learning-List
A list of Human-Object Interaction Learning.
Yui010206/just-ask
[TPAMI Special Issue on ICCV 2021 Best Papers, Oral] Just Ask: Learning to Answer Questions from Millions of Narrated Videos
Yui010206/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
Yui010206/MAC
Yui010206/magenta
Magenta: Music and Art Generation with Machine Intelligence
Yui010206/merlot_reserve
Code release for "MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound"
Yui010206/mmf
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
Yui010206/n2nmn
Code release for Hu et al. Learning to Reason: End-to-End Module Networks for Visual Question Answering. in ICCV, 2017
Yui010206/Person-Search-with-Natural-Language-Description
Person Search with Natural Language Description
Yui010206/Research
novel deep learning research works with PaddlePaddle
Yui010206/Scene-Graph-Benchmark.pytorch
A new codebase for popular Scene Graph Generation methods (2020). Visualization & Scene Graph Extraction on custom images/datasets are provided. It's also a PyTorch implementation of paper “Unbiased Scene Graph Generation from Biased Training CVPR 2020”
Yui010206/seg2vid
Video Generation from Single Semantic Label Map
Yui010206/SJTUThesis
Shanghai Jiao Tong University XeLaTeX Thesis Template
Yui010206/SlowFast
PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.
Yui010206/transformer-time-series-prediction
proof of concept for a transformer-based time series prediction model
Yui010206/VGT
Video Graph Transformer for Video Question Answering (ECCV'22)
Yui010206/video-swin-transformer-pytorch
Video Swin Transformer - PyTorch
Yui010206/video_feature_extractor
Easy to use video deep features extractor
Yui010206/ViLT
Code for the ICML 2021 (long talk) paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"
Yui010206/Yui010206.github.io