ai-benchmarks

There are 10 repositories under ai-benchmarks topic.

  • scicode-bench/SciCode

    A benchmark that challenges language models to code solutions for scientific problems

    Language:Python15352427
  • CAS-CLab/CNN-Inference-Engine-Quick-View

    A quick view of high-performance convolution neural networks (CNNs) inference engines on mobile devices.

  • lechmazur/deception

    Benchmark evaluating LLMs on their ability to create and resist disinformation. Includes comprehensive testing across major models (Claude, GPT-4, Gemini, Llama, etc.) with standardized evaluation metrics.

  • SS47816/AGI-Elo

    [NeurIPS 2025] AGI-Elo: How Far Are We From Mastering A Task?

    Language:Python70
  • DanielButler1/AI-Stats

    A fully open-source database of AI models with benchmark scores, prices, and capabilities.

    Language:TypeScript31
  • Paraskevi-KIvroglou/Hackathon-LlamaEval

    LlamaEval is a rapid prototype developed during a hackathon to provide a user-friendly dashboard for evaluating and comparing Llama models using the TogetherAI API.

    Language:Python0171
  • mrowan137/ml-performance-benchmark

    Performance benchmarking of ML/AI workloads ResNet, CosmoFlow, & DeepCam

    Language:CWeb101
  • North-Shore-AI/crucible_datasets

    Dataset management and caching for AI research benchmarks

    Language:Elixir
  • scriptstar/vector-db-benchmark

    A production-grade benchmarking suite that evaluates vector databases (Qdrant, Milvus, Weaviate, ChromaDB, Pinecone, SQLite, TopK) for music semantic search applications. Features automated performance testing, statistical analysis across 15-20 iterations, real-time web UI for database comparison, and comprehensive reporting with production.

    Language:Python
  • windopper/takeoff

    AI 정보/아티클 게시 서비스

    Language:TypeScript