LibertFan

LibertFan's Stars

huggingface/lerobot
🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning
Language:Python7.6k 82 127729
LargeWorldModel/LWM
Large World Model -- Modeling Text and Video with Millions Context
Language:Python7.2k 66 71552
xlang-ai/OSWorld
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Language:Python1.4k 31 51161
THUDM/AgentTuning
AgentTuning: Enabling Generalized Agent Abilities for LLMs
Language:Python1.4k 16 5396
ShareGPT4Omni/ShareGPT4Video
[NeurIPS 2024] An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Language:Python1.3k 32 3944
yaodongC/awesome-instruction-dataset
A collection of open-source dataset to train instruction-following LLMs (ChatGPT,LLaMA,Alpaca)
1.1k 16 859
wilson1yan/VideoGPT
Language:Jupyter Notebook985 22 39120
robocasa/robocasa
RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots
Language:Python578 11 9448
LAION-AI/aesthetic-predictor
A linear estimator on top of clip to predict the aesthetic quality of pictures
Language:Jupyter Notebook490 13 720
Vision-CAIR/ChatCaptioner
Official Repository of ChatCaptioner
Language:Jupyter Notebook453 4 828
WooooDyy/AgentGym
Code and implementations for the paper "AgentGym: Evolving Large Language Model-based Agents across Diverse Environments" by Zhiheng Xi et al.
Language:Python358 5 1446
njucckevin/SeeClick
The model, data and code for the visual GUI Agent SeeClick
Language:HTML230 2 4512
allenai/WildBench
Benchmarking LLMs with Challenging Tasks from Real Users
Language:Python198 4 838
victorsungo/MMDialog
The official site of paper MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation
Language:Python191 4 47
PKU-EPIC/DexGraspNet
Language:Python187 3 1923
XiaoxiaoGuo/fashion-iq
Language:Python146 3 1739
google-research/android_world
AndroidWorld is an environment and benchmark for autonomous agents
Language:Python139 3 913
Yushi-Hu/VisualSketchpad
Codes for Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models
Language:Jupyter Notebook125 8 87
alipay/Ant-Multi-Modal-Framework
Research Code for Multimodal-Cognition Team in Ant Group
Language:Python123 4 205
yjy0625/equibot
Official implementation for paper "EquiBot: SIM(3)-Equivariant Diffusion Policy for Generalizable and Data Efficient Learning".
Language:Python120 3 016
LibertFan/AI_Hospital
AI Hospital: Interactive Evaluation and Collaboration of LLMs as Intern Doctors for Clinical Diagnosis
Language:Python118 1 517
OpenGVLab/MMT-Bench
ICML'2024 | MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
Language:Python95 5 73
yiye3/GUICourse
GUICourse: From General Vision Langauge Models to Versatile GUI Agents
Language:Python84 4 96
princeton-nlp/CharXiv
[NeurIPS 2024] CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs
Language:Python76 3 88
MILVLG/activitynet-qa
An VideoQA dataset based on the videos from ActivityNet
Language:Python67 3 69
prometheus-eval/prometheus-vision
[ACL 2024 Findings & ICLR 2024 WS] An Evaluator VLM that is open-source, offers reproducible evaluation, and inexpensive to use. Specifically designed for fine-grained evaluation on customized score rubric, Prometheus-Vision is a good alternative for human evaluation and GPT-4V evaluation.
Language:Python58 1 46
google-research-datasets/screen_annotation
The Screen Annotation dataset consists of pairs of mobile screenshots and their annotations. The annotations are in text format, and describe the UI elements present on the screen: their type, location, OCR text and a short description. It has been introduced in the paper `ScreenAI: A Vision-Language Model for UI and Infographics Understanding`.
48 6 39
KwanWaiChung/MT-Eval
Code and data for "MT-Eval: A Multi-Turn Capabilities Evaluation Benchmark for Large Language Models"
Language:Python30 2 44
chuyg1005/seeclick-crawler
Language:Python12 1 03
SkyworkAI/agent-studio
Environments, tools, and benchmarks for general computer agents
Language:Python3 4 813