shirlyliu64

Pinned Repositories

ConvBench
ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Ablation Capability for Large Vision-Language Models
Language:Python31
Conv
00
DMIN
Language:Python00
LawBench
Benchmarking Legal Knowledge of Large Language Models
Language:Python00
MM-NIAH
This is the official implementation of the paper "Needle In A Multimodal Haystack"
Language:Python00
MMT-Bench
ICML'2024 | MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
Language:Python00
Multi-Modality-Arena
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!
Language:Python00
VLMEvalKit
Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 50+ HF models, 20+ benchmarks
Language:Python00
Find-Needle-In-Sea
Language:Python10

shirlyliu64's Repositories

shirlyliu64/ConvBench
ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Ablation Capability for Large Vision-Language Models
Language:Python31
shirlyliu64/VLMEvalKit
Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 50+ HF models, 20+ benchmarks
shirlyliu64/MM-NIAH
This is the official implementation of the paper "Needle In A Multimodal Haystack"
shirlyliu64/Conv
shirlyliu64/MMT-Bench
ICML'2024 | MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
shirlyliu64/Multi-Modality-Arena
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!
Language:Python
shirlyliu64/LawBench
Benchmarking Legal Knowledge of Large Language Models
shirlyliu64/DMIN