ai-benchmarks

There are 10 repositories under ai-benchmarks topic.

scicode-bench/SciCode
A benchmark that challenges language models to code solutions for scientific problems
Language:Python153 5 2427
CAS-CLab/CNN-Inference-Engine-Quick-View
A quick view of high-performance convolution neural networks (CNNs) inference engines on mobile devices.
151 12 218
lechmazur/deception
Benchmark evaluating LLMs on their ability to create and resist disinformation. Includes comprehensive testing across major models (Claude, GPT-4, Gemini, Llama, etc.) with standardized evaluation metrics.
30 2 02
SS47816/AGI-Elo
[NeurIPS 2025] AGI-Elo: How Far Are We From Mastering A Task?
Language:Python70
DanielButler1/AI-Stats
A fully open-source database of AI models with benchmark scores, prices, and capabilities.
Language:TypeScript31
Paraskevi-KIvroglou/Hackathon-LlamaEval
LlamaEval is a rapid prototype developed during a hackathon to provide a user-friendly dashboard for evaluating and comparing Llama models using the TogetherAI API.
Language:Python0 1 71
mrowan137/ml-performance-benchmark
Performance benchmarking of ML/AI workloads ResNet, CosmoFlow, & DeepCam
Language:CWeb1 01
North-Shore-AI/crucible_datasets
Dataset management and caching for AI research benchmarks
Language:Elixir
scriptstar/vector-db-benchmark
A production-grade benchmarking suite that evaluates vector databases (Qdrant, Milvus, Weaviate, ChromaDB, Pinecone, SQLite, TopK) for music semantic search applications. Features automated performance testing, statistical analysis across 15-20 iterations, real-time web UI for database comparison, and comprehensive reporting with production.
Language:Python
windopper/takeoff
AI 정보/아티클 게시 서비스
Language:TypeScript

ai-benchmarks

scicode-bench/SciCode

CAS-CLab/CNN-Inference-Engine-Quick-View

lechmazur/deception

SS47816/AGI-Elo

DanielButler1/AI-Stats

Paraskevi-KIvroglou/Hackathon-LlamaEval

mrowan137/ml-performance-benchmark

North-Shore-AI/crucible_datasets

scriptstar/vector-db-benchmark

windopper/takeoff