Pinned Repositories
Auto-GUI
Official implementation for "You Only Look at Screens: Multimodal Chain-of-Action Agents" (Findings of ACL 2024)
miniwob-plusplus
MiniWoB++: a web interaction benchmark for reinforcement learning
ADS-Cap
A Framework for Accurate and Diverse Stylized Captioning with Unpaired Stylistic Corpora
KnowCap
Code for Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model
nlpSummerCamp2022_caption
An Image Caption Framework
SeeClick
The model, data and code for the visual GUI Agent SeeClick
NCISurvey
Neural Code Intelligence Survey 2024; Reading lists and resources
Qwen-VL
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
webarena
Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"
OSWorld
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
njucckevin's Repositories
njucckevin/SeeClick
The model, data and code for the visual GUI Agent SeeClick
njucckevin/ADS-Cap
A Framework for Accurate and Diverse Stylized Captioning with Unpaired Stylistic Corpora
njucckevin/KnowCap
Code for Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model
njucckevin/nlpSummerCamp2022_caption
An Image Caption Framework