evaluations
There are 26 repositories under evaluations topic.
Scale3-Labs/langtrace
Langtrace 🔍 is an open-source, Open Telemetry based end-to-end observability tool for LLM applications, providing real-time tracing, evaluations and metrics for popular LLMs, LLM frameworks, vectorDBs and more.. Integrate using Typescript, Python. 🚀💻📊
log10-io/log10
Python client library for improving your LLM app accuracy
evalkit/evalkit
The TypeScript LLM Evaluation Library
boxbeam/Crunch
The fastest java expression compiler/evaluator
LLM-Evaluation-s-Always-Fatiguing/leaf-playground
A framework to build scenario simulation projects where human and LLM based agents can participant in, with a user-friendly web UI to visualize simulation, support automatically evaluation on agent action level.
yisaienkov/evaluations
This library implements various metrics (including Kaggle Competition, Medicine) for evaluating ML, DL, AI models, and algorithms. 📐📊📈📉📏
Maitreyapatel/reliability-checklist
NLP tool for wide-range model reliability evaluations
ZainabZaman/IELTS_PracticeAndEvaluation
IELTS listening, speaking, reading and writing modules practice and evaluation with IELTS band calculation based on speech and text analysis and evaluation.
ComputerScienceHouse/conditional
CSH Evals, the modern way.
argrecsys/argael
ARGAEL is an open-source Java desktop application designed to maximize the experience and efficiency of the process of annotating and evaluating arguments in large text corpora.
apartresearch/3cb
3cb: Catastrophic Cyber Capabilities Benchmarking of Large Language Models
jonas-becker/pd-human-vs-machine-content
The official repository for the paper "Paraphrase Detection: Human vs. Machine Content".
rJefferyXie/Chess-Program-with-Minimax-Visualizer
A functional chess game implemented in python, with pygame as a supporting graphics module.
HarryBleckert/moodle-mod_evaluation
Moodle plugin for evaluations with Moodle. This is the evaluation activity plugin.
bhadresh-laiya/program-evaluation.com
Do a program evaluation that really counts! That will help other students and will put really make universities and colleges take students experiences to heart!
GatlenCulp/metr-task-boilerplate
A Cookiecutter template for developing tasks according to the METR Task Standard
mandoline-ai/mandoline-node
Official Node.js client for the Mandoline API
mandoline-ai/mandoline-python
Official Python client for the Mandoline API
brettdidonato/BSD_Evals
LLM evaluation framework
CathyNickEvaluations/cathynickevaluations.github.io
Evaluations for homeschoolers
esleipness/fluiddataPySpark
Utilizing Apache Spark in Google Collab, Jupyter Notebook, Databricks
henrique-souza/evaluation_1_POO
Program made for the first evaluation of object-oriented programming
henrique-souza/evaluation_2_OOP
Program made for the second evaluation of object-oriented programming
johngoeltz/course_evals
A filter that removes unconstructive comments from student course evaluations
AccentureMacr0s/Opinion-Mining-System
You can build a robust opinion mining and website evaluation system on AWS. The combination of data collection, preprocessing, sentiment analysis, and rating calculation ensures that you can efficiently analyze user feedback and generate meaningful insights to evaluate websites.
moreirab/enron-scandal
Machine learning algorithms applied to explore Enron email dataset and figure out patterns about people involved in the scandal.