chziakas

Pinned Repositories

ariadne
LLM Evals for Text Summarization and RAG use-cases.
Language:Python35 2 00
backbone-learn
A Library for Scaling Mixed-Integer Optimization-Based Machine Learning.
Language:Python11 2 01
deepeval
Evaluation and Unit Testing for LLMs
Language:Python1 1 00
doubtbot
Language:Python00
dt-distance
Calculate the structural distance between decision tree models
Language:Jupyter Notebook0 0 00
generative_agents
Generative Agents: Interactive Simulacra of Human Behavior
0 0 00
humaneval_sample_eval
This project evaluates OpenAI's GPT-3.5 model on a sample from the HumanEval dataset to assess its code generation capabilities. The implementation is built in a way that can easily integrate new models and datasets. Parameters such as sample size and the pass@k metric are configurable.
Language:Python0 2 00
node-embeddings-eval
Evaluation protocol for graph embedding methods on link prediction, node classification, and node clustering
Language:Jupyter Notebook0 1 00
redeval
A library for red-teaming LLM applications with LLMs.
Language:Python23 2 15
deepeval
The LLM Evaluation Framework
Language:Python4.1k 23 312335

chziakas's Repositories

chziakas/redeval
A library for red-teaming LLM applications with LLMs.
Language:Python23 2 15
chziakas/backbone-learn
A Library for Scaling Mixed-Integer Optimization-Based Machine Learning.
Language:Python11 2 01
chziakas/deepeval
Evaluation and Unit Testing for LLMs
Language:Python1 1 00
chziakas/doubtbot
Language:Python00
chziakas/dt-distance
Calculate the structural distance between decision tree models
Language:Jupyter Notebook0 0 00
chziakas/generative_agents
Generative Agents: Interactive Simulacra of Human Behavior
0 0 00
chziakas/humaneval_sample_eval
This project evaluates OpenAI's GPT-3.5 model on a sample from the HumanEval dataset to assess its code generation capabilities. The implementation is built in a way that can easily integrate new models and datasets. Parameters such as sample size and the pass@k metric are configurable.
Language:Python0 2 00
chziakas/Minigrid
Simple and easily configurable grid world environments for reinforcement learning
Language:Python0 0 00
chziakas/node-embeddings-eval
Evaluation protocol for graph embedding methods on link prediction, node classification, and node clustering
Language:Jupyter Notebook0 1 00
chziakas/HarmBench
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
chziakas/lmexp
Simple starter code for experiments on open-source LLMs. Built for my SPAR project participants.
chziakas/multimodal-memory
Language:Python1 0
chziakas/multimodal-rag-agent
A retrieval-augmented generative agent with access to image and text memories.
Language:Python1 0
chziakas/selfcheckgpt
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
Language:Python0 0