Pinned Repositories
UNITER
Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Representation Learning"
ClipBERT
[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.
attrEXP
attractiveness experiments on Amazon MTurk
HERO
Research code for EMNLP 2020 paper "HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training"
HERO_Video_Feature_Extractor
Video Feature Extraction Code for EMNLP 2020 paper "HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training"
VALUE
Video And Language Understanding Evaluation
VQA_ReGAT
Research Code for ICCV 2019 paper "Relation-aware Graph Attention Network for Visual Question Answering"
MM-REACT
Official repo for MM-REACT
Segment-Everything-Everywhere-All-At-Once
[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"
VILLA
Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": UNITER adversarial training part
linjieli222's Repositories
linjieli222/HERO
Research code for EMNLP 2020 paper "HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training"
linjieli222/VQA_ReGAT
Research Code for ICCV 2019 paper "Relation-aware Graph Attention Network for Visual Question Answering"
linjieli222/HERO_Video_Feature_Extractor
Video Feature Extraction Code for EMNLP 2020 paper "HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training"
linjieli222/VALUE
Video And Language Understanding Evaluation
linjieli222/attrEXP
attractiveness experiments on Amazon MTurk
linjieli222/bottom-up-attention
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
linjieli222/cc
Creative Commons copyright license files
linjieli222/diffusers
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch
linjieli222/merlot-1
MERLOT: Multimodal Neural Script Knowledge Models
linjieli222/MIL-NCE_HowTo100M
PyTorch GPU distributed training code for MIL-NCE HowTo100M
linjieli222/pythia
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
linjieli222/seada-vqa
A pytorch implemetation of data augmentation method for visual question answering
linjieli222/simi_pair
linjieli222/SlowFast
PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.
linjieli222/TVRetrieval
PyTorch implementation of XML on TVR dataset - TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval
linjieli222/vqa2vln-tutorial.github.io
linjieli222/X-Decoder
[CVPR 2023] Official Implementation of X-Decoder for generalized decoding for pixel, image and language