zchoi
Ph.D. student. Research Interests: LLM-Agents, Vision-Language.
UESTC | TongYi LaboratorySichuan ⇌ Beijing
Pinned Repositories
3D-Vision-and-Language
Collection of recent 3D Vision and Language research
Awesome-Embodied-Agent-with-LLMs
This is a curated list of "Embodied AI or robot with Large Language Models" research. Watch this repository for the latest updates! 🔥
DAST
[MM23] Code for paper "Depth-Aware Sparse Transformer for Video-Language Learning"
GLSCL
Code for "Text-Video Retrieval with Global-Local Semantic Consistent Learning"
Multi-Modal-Large-Language-Learning
Awesome multi-modal large language paper/project, collections of popular training strategies, e.g., PEFT, LoRA.
PKOL
[TIP 2022] Official code of paper “Video Question Answering with Prior Knowledge and Object-sensitive Learning”
S2-Transformer
[IJCAI 2022] Official Pytorch code for paper “S2 Transformer for Image Captioning”
SNLC
[PR23] The implementation of the paper ''Learning Visual Question Answering on Controlled Semantic Noisy Labels''
SPT
[TCSVT23] Official code for "SPT: Spatial Pyramid Transformer for Image Captioning".
VCRN
zchoi's Repositories
zchoi/Awesome-Embodied-Agent-with-LLMs
This is a curated list of "Embodied AI or robot with Large Language Models" research. Watch this repository for the latest updates! 🔥
zchoi/S2-Transformer
[IJCAI 2022] Official Pytorch code for paper “S2 Transformer for Image Captioning”
zchoi/PKOL
[TIP 2022] Official code of paper “Video Question Answering with Prior Knowledge and Object-sensitive Learning”
zchoi/Multi-Modal-Large-Language-Learning
Awesome multi-modal large language paper/project, collections of popular training strategies, e.g., PEFT, LoRA.
zchoi/VCRN
zchoi/GLSCL
Code for "Text-Video Retrieval with Global-Local Semantic Consistent Learning"
zchoi/SPT
[TCSVT23] Official code for "SPT: Spatial Pyramid Transformer for Image Captioning".
zchoi/3D-Vision-and-Language
Collection of recent 3D Vision and Language research
zchoi/SNLC
[PR23] The implementation of the paper ''Learning Visual Question Answering on Controlled Semantic Noisy Labels''
zchoi/DAST
[MM23] Code for paper "Depth-Aware Sparse Transformer for Video-Language Learning"
zchoi/zchoi
zchoi/RSTNet
RSTNet: Captioning with Adaptive Attention on Visual and Non-Visual Words (CVPR 2021)
zchoi/UMP_TVR
[TCSVT24] The implementation of paper "UMP: Unified Modality-aware Prompt Tuning for Text-Video Retrieval".
zchoi/videoqa_model
zchoi/VQAC
zchoi/MAN
zchoi/EMCL
[NeurIPS 2022] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations
zchoi/LMaaS-Papers
Awesome papers on Language-Model-as-a-Service (LMaaS)
zchoi/McQuic
Repository of CVPR'22 paper "Unified Multivariate Gaussian Mixture for Efficient Neural Image Compression"
zchoi/metrics
📊 An infographics generator with 30+ plugins and 200+ options to display stats about your GitHub account and render them as SVG, Markdown, PDF or JSON!
zchoi/rich
Rich is a Python library for rich text and beautiful formatting in the terminal.
zchoi/sam
SAM: Sharpness-Aware Minimization (PyTorch)
zchoi/Vision-and-Language-Benchmark
Codebase for research of vision&language, including various multimodal task pipline (e.g., image captioning, VQA, video-text retrieval), customizable dataset (e.g., MS-COCO, ActivityNet, MSR-VTT), pre-trained model acquire (e.g., CLIP, BLIP-2)
zchoi/MPT
zchoi/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
zchoi/HowToCook
程序员在家做饭方法指南。Programmer's guide about how to cook at home (Simplified Chinese only).