wangkunyu241
USTC Ph.D student, research interests include unmanned aerial vehicle, and robot vision and language navigation.
Pinned Repositories
LLaMA-VID
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)
Awesome-LLM-Robotics
A comprehensive list of papers using large language/multi-modal models for Robotics/RL, including papers, codes, and related websites
CogVLM
a state-of-the-art-level open visual language model | 多模态预训练模型
Awesome-LLM-Robotics
A comprehensive list of papers using large language/multi-modal models for Robotics/RL, including papers, codes, and related websites
Awesome-LLMs-for-Video-Understanding
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
CLGID
This is the code of the paper "Towards Better De-raining Generalization via Rainy Characteristics Memorization and Replay".
SkyFind
This is the code and the dataset of the paper "SkyFind: A Large-Scale Benchmark Unveiling Referring Expression Comprehension for UAV".
UAV-Frequency
This is the code of the paper "Towards Generalized UAV Object Detection: A Novel Perspective from Frequency Domain Disentanglement",which is submitted to IJCV. It is an extension of our CVPR 2023 paper "Generalized UAV Object Detection via Frequency Domain Disentanglement".
Awesome-LLMs-for-Video-Understanding
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
wangkunyu241's Repositories
wangkunyu241/UAV-Frequency
This is the code of the paper "Towards Generalized UAV Object Detection: A Novel Perspective from Frequency Domain Disentanglement",which is submitted to IJCV. It is an extension of our CVPR 2023 paper "Generalized UAV Object Detection via Frequency Domain Disentanglement".
wangkunyu241/CLGID
This is the code of the paper "Towards Better De-raining Generalization via Rainy Characteristics Memorization and Replay".
wangkunyu241/Awesome-LLM-Robotics
A comprehensive list of papers using large language/multi-modal models for Robotics/RL, including papers, codes, and related websites
wangkunyu241/Awesome-LLMs-for-Video-Understanding
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
wangkunyu241/SkyFind
This is the code and the dataset of the paper "SkyFind: A Large-Scale Benchmark Unveiling Referring Expression Comprehension for UAV".