zmykevin
Applied Researcher at Capital One. My research interest lies on Multimodality Learning with vision and language.
New York
Pinned Repositories
A-Visual-Attention-Grounding-Neural-Model
The code repository for our multimodal machine translation project: A Visual Attention Grounding Neural Netowork
ACL2023_ChartT5
The official code implementation of the ACL 2023 Finding paper: Enhanced Chart Understanding in Vision and Language Task via Cross-modal Pre-training on Plot Table Pairs
bert_nli
A Natural Language Inference (NLI) model based on Transformers (BERT and ALBERT)
ChartLlama-code
For the code from ChartLlama-code
ChartOCR
chatgpt-on-wechat
Wechat robot based on ChatGPT, which using OpenAI api and itchat library. 使用ChatGPT搭建微信聊天机器人,基于GPT3.5 API和itchat实现
GanDraw
The Repository for the baselines for GanDraw Dataset
transform-and-tell
Code for Transform and Tell paper in CVPR 2020.
UC2
CVPR 2021 Official Pytorch Code for UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training
UVLP
CVPR 2022 (Oral) Pytorch Code for Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment
zmykevin's Repositories
zmykevin/UC2
CVPR 2021 Official Pytorch Code for UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training
zmykevin/UVLP
CVPR 2022 (Oral) Pytorch Code for Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment
zmykevin/ACL2023_ChartT5
The official code implementation of the ACL 2023 Finding paper: Enhanced Chart Understanding in Vision and Language Task via Cross-modal Pre-training on Plot Table Pairs
zmykevin/ChartOCR
zmykevin/A-Visual-Attention-Grounding-Neural-Model
The code repository for our multimodal machine translation project: A Visual Attention Grounding Neural Netowork
zmykevin/GanDraw
The Repository for the baselines for GanDraw Dataset
zmykevin/transform-and-tell
Code for Transform and Tell paper in CVPR 2020.
zmykevin/bert_nli
A Natural Language Inference (NLI) model based on Transformers (BERT and ALBERT)
zmykevin/ChartLlama-code
For the code from ChartLlama-code
zmykevin/chatgpt-on-wechat
Wechat robot based on ChatGPT, which using OpenAI api and itchat library. 使用ChatGPT搭建微信聊天机器人,基于GPT3.5 API和itchat实现
zmykevin/CLIP
Contrastive Language-Image Pretraining
zmykevin/Detectron
FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.
zmykevin/detectron2
Detectron2 is FAIR's next-generation platform for object detection and segmentation.
zmykevin/CLIPTrans
Official implementation for the paper "Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation", published at ICCV'23.
zmykevin/fairseq
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
zmykevin/GeNeVA
Code to train and evaluate the GeNeVA-GAN model for the GeNeVA task proposed in our ICCV 2019 paper "Tell, Draw, and Repeat: Generating and modifying images based on continual linguistic instruction"
zmykevin/GeNeVA_datasets
Scripts to generate the CoDraw and i-CLEVR datasets used for the GeNeVA task proposed in our ICCV 2019 paper "Tell, Draw, and Repeat: Generating and modifying images based on continual linguistic instruction"
zmykevin/gquestions
Find "People Also Ask" questions
zmykevin/LLaVA
Large Language-and-Vision Assistant built towards multimodal GPT-4 level capabilities.
zmykevin/open_clip
An open source implementation of CLIP.
zmykevin/personal-website
Code that'll help you kickstart a personal website that showcases your work as a software developer.
zmykevin/Prompt-Engineering-Guide
🐙 Guides, papers, lecture, notebooks and resources for prompt engineering
zmykevin/simple-flask-socketio-example
zmykevin/SPADE
Semantic Image Synthesis with SPADE
zmykevin/StageDP
A two-stage RST discourse parser
zmykevin/transformers
🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.
zmykevin/UNITER
Research code for "UNITER: Learning UNiversal Image-TExt Representations"
zmykevin/vilbert_beta
zmykevin/vokenization
PyTorch code for EMNLP 2020 Paper "Vokenization: Improving Language Understanding with Visual Supervision"
zmykevin/XLM
PyTorch original implementation of Cross-lingual Language Model Pretraining.