evaluations

There are 26 repositories under evaluations topic.

Scale3-Labs/langtrace
Langtrace 🔍 is an open-source, Open Telemetry based end-to-end observability tool for LLM applications, providing real-time tracing, evaluations and metrics for popular LLMs, LLM frameworks, vectorDBs and more.. Integrate using Typescript, Python. 🚀💻📊
Language:TypeScript652 10 2164
log10-io/log10
Python client library for improving your LLM app accuracy
Language:Python96 4 99
evalkit/evalkit
The TypeScript LLM Evaluation Library
Language:TypeScript70 2 01
boxbeam/Crunch
The fastest java expression compiler/evaluator
Language:Java69 4 89
LLM-Evaluation-s-Always-Fatiguing/leaf-playground
A framework to build scenario simulation projects where human and LLM based agents can participant in, with a user-friendly web UI to visualize simulation, support automatically evaluation on agent action level.
Language:Python24 4 30
yisaienkov/evaluations
This library implements various metrics (including Kaggle Competition, Medicine) for evaluating ML, DL, AI models, and algorithms. 📐📊📈📉📏
Language:Python14 1 01
Maitreyapatel/reliability-checklist
NLP tool for wide-range model reliability evaluations
Language:Python12 5 150
ZainabZaman/IELTS_PracticeAndEvaluation
IELTS listening, speaking, reading and writing modules practice and evaluation with IELTS band calculation based on speech and text analysis and evaluation.
Language:Python11 1 21
ComputerScienceHouse/conditional
CSH Evals, the modern way.
Language:Python10 7 13030
argrecsys/argael
ARGAEL is an open-source Java desktop application designed to maximize the experience and efficiency of the process of annotating and evaluating arguments in large text corpora.
Language:Java5 0 11
apartresearch/3cb
3cb: Catastrophic Cyber Capabilities Benchmarking of Large Language Models
Language:Python4 4 1
jonas-becker/pd-human-vs-machine-content
The official repository for the paper "Paraphrase Detection: Human vs. Machine Content".
Language:HTML3 1 10
rJefferyXie/Chess-Program-with-Minimax-Visualizer
A functional chess game implemented in python, with pygame as a supporting graphics module.
Language:Python3 1 00
HarryBleckert/moodle-mod_evaluation
Moodle plugin for evaluations with Moodle. This is the evaluation activity plugin.
Language:PHP2 1 00
bhadresh-laiya/program-evaluation.com
Do a program evaluation that really counts! That will help other students and will put really make universities and colleges take students experiences to heart!
Language:PHP1 1 00
GatlenCulp/metr-task-boilerplate
A Cookiecutter template for developing tasks according to the METR Task Standard
Language:TypeScript10
mandoline-ai/mandoline-node
Official Node.js client for the Mandoline API
Language:TypeScript1
mandoline-ai/mandoline-python
Official Python client for the Mandoline API
Language:Python1
brettdidonato/BSD_Evals
LLM evaluation framework
Language:Jupyter Notebook00
CathyNickEvaluations/cathynickevaluations.github.io
Evaluations for homeschoolers
Language:HTML0 0 00
esleipness/fluiddataPySpark
Utilizing Apache Spark in Google Collab, Jupyter Notebook, Databricks
Language:Jupyter Notebook00
henrique-souza/evaluation_1_POO
Program made for the first evaluation of object-oriented programming
Language:Java0 1 00
henrique-souza/evaluation_2_OOP
Program made for the second evaluation of object-oriented programming
Language:Java0 1 00
johngoeltz/course_evals
A filter that removes unconstructive comments from student course evaluations
Language:Jupyter Notebook0 1 00
AccentureMacr0s/Opinion-Mining-System
You can build a robust opinion mining and website evaluation system on AWS. The combination of data collection, preprocessing, sentiment analysis, and rating calculation ensures that you can efficiently analyze user feedback and generate meaningful insights to evaluate websites.
Language:Python
moreirab/enron-scandal
Machine learning algorithms applied to explore Enron email dataset and figure out patterns about people involved in the scandal.
Language:DIGITAL Command Language1 0

evaluations

Scale3-Labs/langtrace

log10-io/log10

evalkit/evalkit

boxbeam/Crunch

LLM-Evaluation-s-Always-Fatiguing/leaf-playground

yisaienkov/evaluations

Maitreyapatel/reliability-checklist

ZainabZaman/IELTS_PracticeAndEvaluation

ComputerScienceHouse/conditional

argrecsys/argael

apartresearch/3cb

jonas-becker/pd-human-vs-machine-content

rJefferyXie/Chess-Program-with-Minimax-Visualizer

HarryBleckert/moodle-mod_evaluation

bhadresh-laiya/program-evaluation.com

GatlenCulp/metr-task-boilerplate

mandoline-ai/mandoline-node

mandoline-ai/mandoline-python

brettdidonato/BSD_Evals

CathyNickEvaluations/cathynickevaluations.github.io

esleipness/fluiddataPySpark

henrique-souza/evaluation_1_POO

henrique-souza/evaluation_2_OOP

johngoeltz/course_evals

AccentureMacr0s/Opinion-Mining-System

moreirab/enron-scandal