ai-benchmarks
There are 10 repositories under ai-benchmarks topic.
scicode-bench/SciCode
A benchmark that challenges language models to code solutions for scientific problems
CAS-CLab/CNN-Inference-Engine-Quick-View
A quick view of high-performance convolution neural networks (CNNs) inference engines on mobile devices.
lechmazur/deception
Benchmark evaluating LLMs on their ability to create and resist disinformation. Includes comprehensive testing across major models (Claude, GPT-4, Gemini, Llama, etc.) with standardized evaluation metrics.
SS47816/AGI-Elo
[NeurIPS 2025] AGI-Elo: How Far Are We From Mastering A Task?
DanielButler1/AI-Stats
A fully open-source database of AI models with benchmark scores, prices, and capabilities.
Paraskevi-KIvroglou/Hackathon-LlamaEval
LlamaEval is a rapid prototype developed during a hackathon to provide a user-friendly dashboard for evaluating and comparing Llama models using the TogetherAI API.
mrowan137/ml-performance-benchmark
Performance benchmarking of ML/AI workloads ResNet, CosmoFlow, & DeepCam
North-Shore-AI/crucible_datasets
Dataset management and caching for AI research benchmarks
scriptstar/vector-db-benchmark
A production-grade benchmarking suite that evaluates vector databases (Qdrant, Milvus, Weaviate, ChromaDB, Pinecone, SQLite, TopK) for music semantic search applications. Features automated performance testing, statistical analysis across 15-20 iterations, real-time web UI for database comparison, and comprehensive reporting with production.
windopper/takeoff
AI 정보/아티클 게시 서비스