Pinned Repositories
Deception
evals
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
llm-attacks
Universal and Transferable Attacks on Aligned Language Models
sandbagging_probes
tinygrad
You like pytorch? You like micrograd? You love tinygrad! ❤️
evals
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
mle-bench
MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering
SWE-bench
[ICLR 2024] SWE-bench: Can Language Models Resolve Real-world Github Issues?
MLAgentBench
aideml
AIDE: the state-of-the-art machine learning engineer agent, generating machine learning solution code from natural language descriptions.
ojaffe's Repositories
ojaffe/Polyphonic-OMR
Automatically identifies notes within an image of a musical piece
ojaffe/batch_export
MuseScore plugin to convert various input formats into various output formats
ojaffe/Deception
ojaffe/evals
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
ojaffe/gpt-neox
An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.
ojaffe/llama2.py
Inference Llama 2 in one file of pure Python
ojaffe/llm-attacks
Universal and Transferable Attacks on Aligned Language Models
ojaffe/polyphonic-omr-baseline
Code used in research that led to the paper "An Empirical Evaluation of End-to-End Polyphonic Optical Music Recognition" (ISMIR 2021)
ojaffe/Remove-First-Score
Plugin for MuseScore, removes files
ojaffe/sandbagging_probes
ojaffe/tinygrad
You like pytorch? You like micrograd? You love tinygrad! ❤️
ojaffe/tinystories_robust_probes
ojaffe/TruthfulQA-Finetuning
Efficient finetuning of huggingface GPT-2 models on TruthfulQA with a single GPU.