peterbhase

AI researcher interested in AI safety and NLP

Chapel Hill

Pinned Repositories

anchor
Code for "High-Precision Model-Agnostic Explanations" paper
Language:Jupyter Notebook0 0 00
ExplanationRoles
Code for paper "When Can Models Learn From Explanations? A Formal Framework for Understanding the Roles of Explanation Data"
Language:Python14 2 00
ExplanationSearch
Code for paper "Search Methods for Sufficient, Socially-Aligned Feature Importance Explanations with In-Distribution Counterfactuals"
Language:Jupyter Notebook17 1 02
interpretable-image
Code for "Interpretable Image Recognition with Hierarchical Prototypes"
Language:Python18 3 05
InterpretableNLP-ACL2020
Code for "Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior?"
Language:Python44 2 23
LAS-NL-Explanations
Code for paper "Leakage-Adjusted Simulatability: Can Models Generate Non-Trivial Explanations of Their Behavior in Natural Language?"
Language:Python20 2 05
LLM-belief-revision
Language:Python4 1 00
mechanistic-interpretability
Language:Python1 1 00
poetry-generation
Code for "Shall I Compare Thee to a Machine-Written Sonnet? An Algorithmic Approach to Sonnet Generation", available at https://arxiv.org/abs/1811.05067
Language:Python6 1 05
SLAG-Belief-Updating
Code for paper "Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs"
Language:Python28 1 02

peterbhase's Repositories

peterbhase/InterpretableNLP-ACL2020
Code for "Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior?"
Language:Python44 2 23
peterbhase/SLAG-Belief-Updating
Code for paper "Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs"
Language:Python28 1 02
peterbhase/LAS-NL-Explanations
Code for paper "Leakage-Adjusted Simulatability: Can Models Generate Non-Trivial Explanations of Their Behavior in Natural Language?"
Language:Python20 2 05
peterbhase/interpretable-image
Code for "Interpretable Image Recognition with Hierarchical Prototypes"
Language:Python18 3 05
peterbhase/ExplanationSearch
Code for paper "Search Methods for Sufficient, Socially-Aligned Feature Importance Explanations with In-Distribution Counterfactuals"
Language:Jupyter Notebook17 1 02
peterbhase/ExplanationRoles
Code for paper "When Can Models Learn From Explanations? A Formal Framework for Understanding the Roles of Explanation Data"
Language:Python14 2 00
peterbhase/poetry-generation
Code for "Shall I Compare Thee to a Machine-Written Sonnet? An Algorithmic Approach to Sonnet Generation", available at https://arxiv.org/abs/1811.05067
Language:Python6 1 05
peterbhase/LLM-belief-revision
Language:Python4 1 00
peterbhase/mechanistic-interpretability
Language:Python1 1 00
peterbhase/anchor
Code for "High-Precision Model-Agnostic Explanations" paper
Language:Jupyter Notebook0 0 00
peterbhase/evolution-strategies-exploration
Contains implementation of: Tim Salimans Et al. “Evolution Strategies as a Scalable Alternative to Reinforcement Learning”. Arxiv.org. https://arxiv.org/pdf/1703.03864.pdf.
Language:Jupyter Notebook0 0 00
peterbhase/peterbhase.github.io
Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes
Language:JavaScript0 01
peterbhase/rome
Rank-One Model Editing for Locating and Editing Factual Knowledge in GPT
Language:Python0 0
peterbhase/tennis_wta
WTA Tennis Rankings, Results, and Stats
0 0
peterbhase/transformers
🤗 Transformers: State-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch.
Language:Python1 0

peterbhase

Pinned Repositories

anchor

ExplanationRoles

ExplanationSearch

interpretable-image

InterpretableNLP-ACL2020

LAS-NL-Explanations

LLM-belief-revision

mechanistic-interpretability

poetry-generation

SLAG-Belief-Updating

peterbhase's Repositories

peterbhase/InterpretableNLP-ACL2020

peterbhase/SLAG-Belief-Updating

peterbhase/LAS-NL-Explanations

peterbhase/interpretable-image

peterbhase/ExplanationSearch

peterbhase/ExplanationRoles

peterbhase/poetry-generation

peterbhase/LLM-belief-revision

peterbhase/mechanistic-interpretability

peterbhase/anchor

peterbhase/evolution-strategies-exploration

peterbhase/peterbhase.github.io

peterbhase/rome

peterbhase/tennis_wta

peterbhase/transformers