wangkunyu241

USTC Ph.D student, research interests include unmanned aerial vehicle, and robot vision and language navigation.

Pinned Repositories

LLaMA-VID
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)
Language:Python742 14 10944
Awesome-LLM-Robotics
A comprehensive list of papers using large language/multi-modal models for Robotics/RL, including papers, codes, and related websites
3.1k 96 13245
CogVLM
a state-of-the-art-level open visual language model | 多模态预训练模型
Language:Python6.1k 66 426417
Awesome-LLM-Robotics
A comprehensive list of papers using large language/multi-modal models for Robotics/RL, including papers, codes, and related websites
00
Awesome-LLMs-for-Video-Understanding
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
00
CLGID
This is the code of the paper "Towards Better De-raining Generalization via Rainy Characteristics Memorization and Replay".
Language:Python10
SkyFind
This is the code and the dataset of the paper "SkyFind: A Large-Scale Benchmark Unveiling Referring Expression Comprehension for UAV".
00
UAV-Frequency
This is the code of the paper "Towards Generalized UAV Object Detection: A Novel Perspective from Frequency Domain Disentanglement"，which is submitted to IJCV. It is an extension of our CVPR 2023 paper "Generalized UAV Object Detection via Frequency Domain Disentanglement".
Language:Python14 3 41
Awesome-LLMs-for-Video-Understanding
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
1.6k 46 482

wangkunyu241/UAV-Frequency
This is the code of the paper "Towards Generalized UAV Object Detection: A Novel Perspective from Frequency Domain Disentanglement"，which is submitted to IJCV. It is an extension of our CVPR 2023 paper "Generalized UAV Object Detection via Frequency Domain Disentanglement".
Language:Python14 3 41
wangkunyu241/CLGID
This is the code of the paper "Towards Better De-raining Generalization via Rainy Characteristics Memorization and Replay".
Language:Python10
wangkunyu241/Awesome-LLM-Robotics
A comprehensive list of papers using large language/multi-modal models for Robotics/RL, including papers, codes, and related websites
00
wangkunyu241/Awesome-LLMs-for-Video-Understanding
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
00
wangkunyu241/SkyFind
This is the code and the dataset of the paper "SkyFind: A Large-Scale Benchmark Unveiling Referring Expression Comprehension for UAV".
00