Pinned Repositories
ariadne
LLM Evals for Text Summarization and RAG use-cases.
backbone-learn
A Library for Scaling Mixed-Integer Optimization-Based Machine Learning.
deepeval
Evaluation and Unit Testing for LLMs
doubtbot
dt-distance
Calculate the structural distance between decision tree models
generative_agents
Generative Agents: Interactive Simulacra of Human Behavior
humaneval_sample_eval
This project evaluates OpenAI's GPT-3.5 model on a sample from the HumanEval dataset to assess its code generation capabilities. The implementation is built in a way that can easily integrate new models and datasets. Parameters such as sample size and the pass@k metric are configurable.
node-embeddings-eval
Evaluation protocol for graph embedding methods on link prediction, node classification, and node clustering
redeval
A library for red-teaming LLM applications with LLMs.
deepeval
The LLM Evaluation Framework
chziakas's Repositories
chziakas/redeval
A library for red-teaming LLM applications with LLMs.
chziakas/backbone-learn
A Library for Scaling Mixed-Integer Optimization-Based Machine Learning.
chziakas/deepeval
Evaluation and Unit Testing for LLMs
chziakas/doubtbot
chziakas/dt-distance
Calculate the structural distance between decision tree models
chziakas/generative_agents
Generative Agents: Interactive Simulacra of Human Behavior
chziakas/humaneval_sample_eval
This project evaluates OpenAI's GPT-3.5 model on a sample from the HumanEval dataset to assess its code generation capabilities. The implementation is built in a way that can easily integrate new models and datasets. Parameters such as sample size and the pass@k metric are configurable.
chziakas/Minigrid
Simple and easily configurable grid world environments for reinforcement learning
chziakas/node-embeddings-eval
Evaluation protocol for graph embedding methods on link prediction, node classification, and node clustering
chziakas/HarmBench
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
chziakas/lmexp
Simple starter code for experiments on open-source LLMs. Built for my SPAR project participants.
chziakas/multimodal-memory
chziakas/multimodal-rag-agent
A retrieval-augmented generative agent with access to image and text memories.
chziakas/selfcheckgpt
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models