evaluation-metrics

There are 388 repositories under evaluation-metrics topic.

  • confident-ai/deepeval

    The LLM Evaluation Framework

    Language:Python2.1k15191145
  • xinshuoweng/AB3DMOT

    (IROS 2020, ECCVW 2020) Official Python Implementation for "3D Multi-Object Tracking: A Baseline and New Evaluation Metrics"

    Language:Python1.6k48103401
  • AgentOps-AI/agentops

    Python SDK for agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks like CrewAI, Langchain, and Autogen

    Language:Python884135664
  • google-research/rliable

    [NeurIPS'21 Outstanding Paper] Library for reliable evaluation on RL and ML benchmarks, even with only a handful of seeds.

    Language:Jupyter Notebook718111543
  • OCTIS

    MIND-Lab/OCTIS

    OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)

    Language:Python6951410295
  • jitsi/jiwer

    Evaluate your speech-to-text system with similarity measures such as word error rate (WER)

    Language:Python558154391
  • up42/image-similarity-measures

    :chart_with_upwards_trend: Implementation of eight evaluation metrics to access the similarity between two images. The eight metrics are as follows: RMSE, PSNR, SSIM, ISSM, FSIM, SRE, SAM, and UIQ.

    Language:Python533132467
  • proycon/pynlpl

    PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).

    Language:Python477322567
  • huggingface/lighteval

    LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.

    Language:Python437328150
  • COMET

    Unbabel/COMET

    A Neural Framework for MT Evaluation

    Language:Python4231715769
  • ranx

    AmenRa/ranx

    ⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍

    Language:Python362115621
  • relari-ai/continuous-eval

    Open-Source Evaluation for GenAI Application Pipelines

    Language:Python34242216
  • v-iashin/SpecVQGAN

    Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)

    Language:Jupyter Notebook32783337
  • salesforce/factCC

    Resources for the "Evaluating the Factual Consistency of Abstractive Text Summarization" paper

    Language:Python267101431
  • bheinzerling/pyrouge

    A Python wrapper for the ROUGE summarization evaluation package

    Language:Python24843170
  • clovaai/generative-evaluation-prdc

    Code base for the precision, recall, density, and coverage metrics for generative models. ICML 2020.

    Language:Python2339927
  • FuxiaoLiu/LRV-Instruction

    [ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

    Language:Python225112214
  • TonicAI/tonic_validate

    Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.

    Language:Python222123125
  • davidsbatista/NER-Evaluation

    An implementation of a full named-entity evaluation metrics based on SemEval'13 Task 9 - not at tag/token level but considering all the tokens that are part of the named-entity

    Language:Python209101649
  • sharmaroshan/Twitter-Sentiment-Analysis

    It is a Natural Language Processing Problem where Sentiment Analysis is done by Classifying the Positive tweets from negative tweets by machine learning models for classification, text mining, text analysis, data analysis and data visualization

    Language:Jupyter Notebook20644124
  • clovaai/CLEval

    CLEval: Character-Level Evaluation for Text Detection and Recognition Tasks

    Language:Python18410928
  • tagucci/pythonrouge

    Python wrapper for evaluating summarization quality by ROUGE package

    Language:Perl16533133
  • feralvam/easse

    Easier Automatic Sentence Simplification Evaluation

    Language:Roff15365036
  • lartpang/PySODEvalToolkit

    PySODEvalToolkit: A Python-based Evaluation Toolbox for Salient Object Detection and Camouflaged Object Detection

    Language:Python15032220
  • MantisAI/nervaluate

    Full named-entity (i.e., not tag/token) evaluation metrics based on SemEval’13

    Language:Python14353017
  • athina-ai/athina-evals

    Python SDK for running evaluations on LLM generated responses

    Language:Python1415011
  • fakufaku/fast_bss_eval

    A fast implementation of bss_eval metrics for blind source separation

    Language:Python1264108
  • om-ai-lab/VL-CheckList

    Evaluating Vision & Language Pretraining Models with Objects, Attributes and Relations.

    Language:Python1216114
  • YuanXinCherry/Person-reID-Evaluation

    GOM:New Metric for Re-identification. 👉GOM explicitly balances the effect of performing retrieval and verification into a single unified metric.

    Language:Python1075115
  • tohinz/semantic-object-accuracy-for-generative-text-to-image-synthesis

    Code for "Semantic Object Accuracy for Generative Text-to-Image Synthesis" (TPAMI 2020)

    Language:Python10561023
  • LAIT-CVLab/TopPR

    NeurIPS 2023 - TopP&R: Robust Support Estimation Approach for Evaluating Fidelity and Diversity in Generative Models Official Code

    Language:Python103003
  • msmsajjadi/precision-recall-distributions

    Assessing Generative Models via Precision and Recall (official repository)

    Language:Python1002011
  • Muhtasham/summarization-eval

    📝 Reference-Free automatic summarization evaluation with potential hallucination detection

    Language:Python94406
  • tanyuqian/ctc-gen-eval

    EMNLP 2021 - CTC: A Unified Framework for Evaluating Natural Language Generation

    Language:Python94769
  • hpclab/rankeval

    Official repository of RankEval: An Evaluation and Analysis Framework for Learning-to-Rank Solutions.

    Language:Python87141911
  • Coldmist-Lu/ErrorAnalysis_Prompt

    :gift:[ChatGPT4MTevaluation] ErrorAnalysis Prompt for MT Evaluation in ChatGPT

    Language:Python84313