zchoi

Ph.D. student. Research Interests: LLM-Agents, Vision-Language.

UESTC | TongYi LaboratorySichuan ⇌ Beijing

Pinned Repositories

MMEvol
🔥🔥🔥Code for "Empowering Multimodal Large Language Models with Evol-Instruct"
Language:Jupyter Notebook110
3D-Vision-and-Language
Collection of recent 3D Vision and Language research
8 1 01
Awesome-Embodied-Agent-with-LLMs
This is a curated list of "Embodied AI or robot with Large Language Models" research. Watch this repository for the latest updates! 🔥
1.1k 44 361
GLSCL
Code for "Text-Video Retrieval with Global-Local Semantic Consistent Learning"
Language:Python10 3 10
Multi-Modal-Large-Language-Learning
Awesome multi-modal large language paper/project, collections of popular training strategies, e.g., PEFT, LoRA.
24 3 00
PKOL
[TIP 2022] Official code of paper “Video Question Answering with Prior Knowledge and Object-sensitive Learning”
Language:Python46 2 10
S2-Transformer
[IJCAI 2022] Official Pytorch code for paper “S2 Transformer for Image Captioning”
Language:Python83 2 124
SNLC
[PR23] The implementation of the paper ''Learning Visual Question Answering on Controlled Semantic Noisy Labels''
Language:Python8 2 10
SPT
[TCSVT23] Official code for "SPT: Spatial Pyramid Transformer for Image Captioning".
Language:Python10 1 00
VCRN
Language:Python11 1 10

zchoi's Repositories

zchoi/Awesome-Embodied-Agent-with-LLMs
This is a curated list of "Embodied AI or robot with Large Language Models" research. Watch this repository for the latest updates! 🔥
1.1k 44 361
zchoi/S2-Transformer
[IJCAI 2022] Official Pytorch code for paper “S2 Transformer for Image Captioning”
Language:Python83 2 124
zchoi/PKOL
[TIP 2022] Official code of paper “Video Question Answering with Prior Knowledge and Object-sensitive Learning”
Language:Python46 2 10
zchoi/Multi-Modal-Large-Language-Learning
Awesome multi-modal large language paper/project, collections of popular training strategies, e.g., PEFT, LoRA.
24 3 00
zchoi/VCRN
Language:Python11 1 10
zchoi/GLSCL
Code for "Text-Video Retrieval with Global-Local Semantic Consistent Learning"
Language:Python10 3 10
zchoi/SPT
[TCSVT23] Official code for "SPT: Spatial Pyramid Transformer for Image Captioning".
Language:Python10 1 00
zchoi/3D-Vision-and-Language
Collection of recent 3D Vision and Language research
8 1 01
zchoi/SNLC
[PR23] The implementation of the paper ''Learning Visual Question Answering on Controlled Semantic Noisy Labels''
Language:Python8 2 10
zchoi/DAST
[MM23] Code for paper "Depth-Aware Sparse Transformer for Video-Language Learning"
Language:Python6 1 00
zchoi/zchoi
6 1 00
zchoi/MAN
Language:Python5 1 10
zchoi/RSTNet
RSTNet: Captioning with Adaptive Attention on Visual and Non-Visual Words (CVPR 2021)
Language:Python5 0 00
zchoi/UMP_TVR
[TCSVT24] The implementation of paper "UMP: Unified Modality-aware Prompt Tuning for Text-Video Retrieval".
Language:Python5 2 0
zchoi/videoqa_model
Language:Jupyter Notebook5 1 10
zchoi/VQAC
Language:Python50
zchoi/EMCL
[NeurIPS 2022] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations
Language:Python2 0 0
zchoi/LMaaS-Papers
Awesome papers on Language-Model-as-a-Service (LMaaS)
2 0 0
zchoi/McQuic
Repository of CVPR'22 paper "Unified Multivariate Gaussian Mixture for Efficient Neural Image Compression"
2
zchoi/metrics
📊 An infographics generator with 30+ plugins and 200+ options to display stats about your GitHub account and render them as SVG, Markdown, PDF or JSON!
Language:JavaScript2 0 0
zchoi/rich
Rich is a Python library for rich text and beautiful formatting in the terminal.
Language:Python2 0 0
zchoi/sam
SAM: Sharpness-Aware Minimization (PyTorch)
Language:Python2 0 0
zchoi/Vision-and-Language-Benchmark
Codebase for research of vision&language, including various multimodal task pipline (e.g., image captioning, VQA, video-text retrieval), customizable dataset (e.g., MS-COCO, ActivityNet, MSR-VTT), pre-trained model acquire (e.g., CLIP, BLIP-2)
2 1 01
zchoi/zchoi.github.io
My personal homepage
Language:SCSS2 1 0
zchoi/MPT
1 1 0
zchoi/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
zchoi/DAMO-ConvAI
DAMO-ConvAI: The official repository which contains the codebase for Alibaba DAMO Conversational AI.
Language:Python
zchoi/HowToCook
程序员在家做饭方法指南。Programmer's guide about how to cook at home (Simplified Chinese only).
zchoi/LLM4Annotation