notrichardren

Philadelphia, PA

Pinned Repositories

safetywashing
Measuring correlations between safety benchmarks and general AI capabilities benchmarks.
Language:Python6 0 00
llama-lying
Code for our paper "Localizing Lying in Llama"
Language:Jupyter Notebook10 1 02
iti_capstone
Analyzing truth representations in LLMs across different kinds of truth and intervening on their hidden states to make LLMs more truthful
Language:Jupyter Notebook5 0 01
ADecodingTrustYouCanTrust
A codebase that actually works for DecodingTrust evaluations, from scratch.
0 1 00
AI-job-exposure
Using NLP to construct an automation exposure metric using semantic overlap between patent text and occupational task descriptions.
Language:Jupyter Notebook1 1 00
arena-curriculum
Exercises on mechanistic interpretability, RL, and training models at scale
Language:Jupyter Notebook0 0 00
arena_curriculum_trlxRLHF
Completed ARENA2.0 RLHF exercises.
Language:Jupyter Notebook0 0 00
code-repository
How to do llama-70b HuggingFace inference, parallelized across multiple GPUs
Language:Python3 1 00
representation-engineering
Representation Engineering: A Top-Down Approach to AI Transparency
Language:Jupyter Notebook0 0 00
wolfram-toolformer-tests
An exploratory project to test out GPT's math ability when fine-tuned and augmented with the Wolfram Alpha API.
Language:Jupyter Notebook2 1 00

notrichardren's Repositories

notrichardren/code-repository
How to do llama-70b HuggingFace inference, parallelized across multiple GPUs
Language:Python3 1 00
notrichardren/wolfram-toolformer-tests
An exploratory project to test out GPT's math ability when fine-tuned and augmented with the Wolfram Alpha API.
Language:Jupyter Notebook2 1 00
notrichardren/AI-job-exposure
Using NLP to construct an automation exposure metric using semantic overlap between patent text and occupational task descriptions.
Language:Jupyter Notebook1 1 00
notrichardren/ADecodingTrustYouCanTrust
A codebase that actually works for DecodingTrust evaluations, from scratch.
0 1 00
notrichardren/arena-curriculum
Exercises on mechanistic interpretability, RL, and training models at scale
Language:Jupyter Notebook0 0 00
notrichardren/arena_curriculum_trlxRLHF
Completed ARENA2.0 RLHF exercises.
Language:Jupyter Notebook0 0 00
notrichardren/cis522-course-fork-ec-1
Let's grind those ec points
Language:Jupyter Notebook0 0 00
notrichardren/CIS522-homework
Language:Python0 0 00
notrichardren/discovering_latent_knowledge
Language:Python0 0 00
notrichardren/ENM5310
Language:Jupyter Notebook0 0 00
notrichardren/fastbook
The fastai book, published as Jupyter Notebooks
Language:Jupyter Notebook0 0 00
notrichardren/representation-engineering
Representation Engineering: A Top-Down Approach to AI Transparency
Language:Jupyter Notebook0 0 00
notrichardren/cluster-docs
Center for AI Safety Cluster Documentation
Language:CSS0 0
notrichardren/DecodingTrust
Trying to get DecodingTrust evaluations to work
Language:Python0 0
notrichardren/evaluation-robust-control
A framework for few-shot evaluation of language models.
Language:Python0 0
notrichardren/harmbench_static
Language:Python1 0
notrichardren/iti
Fork of Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
Language:Python0 0
notrichardren/notrichardren
Config files for Github profile.
1 0
notrichardren/notrichardren.github.io
Language:JavaScript1 0
notrichardren/PurpleLlama
Set of tools to assess and improve LLM security.
Language:Python0 0
notrichardren/segment-edit
Image editing with segmentation.
1 0
notrichardren/STEER-evaluation
Language:Python0 0
notrichardren/Testing-AidanBench
Aidan Bench attempts to measure <big_model_smell> in LLMs.
notrichardren/toolformer-data-cleaning
LLM that can (generate code to) clean your data for you
1 0
notrichardren/truthfulness_high_quality
load_from_disk("truthfulness_high_quality")
1 0

notrichardren

Pinned Repositories

safetywashing

llama-lying

iti_capstone

ADecodingTrustYouCanTrust

AI-job-exposure

arena-curriculum

arena_curriculum_trlxRLHF

code-repository

representation-engineering

wolfram-toolformer-tests

notrichardren's Repositories

notrichardren/code-repository

notrichardren/wolfram-toolformer-tests

notrichardren/AI-job-exposure

notrichardren/ADecodingTrustYouCanTrust

notrichardren/arena-curriculum

notrichardren/arena_curriculum_trlxRLHF

notrichardren/cis522-course-fork-ec-1

notrichardren/CIS522-homework

notrichardren/discovering_latent_knowledge

notrichardren/ENM5310

notrichardren/fastbook

notrichardren/representation-engineering

notrichardren/cluster-docs

notrichardren/DecodingTrust

notrichardren/evaluation-robust-control

notrichardren/harmbench_static

notrichardren/iti

notrichardren/notrichardren

notrichardren/notrichardren.github.io

notrichardren/PurpleLlama

notrichardren/segment-edit

notrichardren/STEER-evaluation

notrichardren/Testing-AidanBench

notrichardren/toolformer-data-cleaning

notrichardren/truthfulness_high_quality