Pinned Repositories
self-ablating-transformers
A self-modeling transformer with an auxiliary output head that is an ablation mask for itself, in a second forward pass
emili
EMILI (Emotionally Intelligent Listener) adds emotion tags sourced from video to your OpenAI API calls
ARENA_3.0
Deception-RepE
emotion-tune
SPAR Summer 2024: Improving RLHF with Emotion-based Feedback
self-ablating-transformers
A self-modeling transformer with an auxiliary output head that is an ablation mask for itself, in a second forward pass
self-modeling-ResNet_CIFAR
Code for replicating the experiments reported on in Unexpected Benefits of Self-Modeling in Neural Systems
werewolf-bench
Benchmarking AI deception with One Night: Ultimate Werewolf game
spinningup
An educational resource to help anyone learn deep reinforcement learning.
WashBench
Summarization Relevance Benchmark for Large Language Models
LuhanMikaelson's Repositories
LuhanMikaelson/werewolf-bench
Benchmarking AI deception with One Night: Ultimate Werewolf game
LuhanMikaelson/ARENA_3.0
LuhanMikaelson/Deception-RepE
LuhanMikaelson/emotion-tune
SPAR Summer 2024: Improving RLHF with Emotion-based Feedback
LuhanMikaelson/self-ablating-transformers
A self-modeling transformer with an auxiliary output head that is an ablation mask for itself, in a second forward pass
LuhanMikaelson/self-modeling-ResNet_CIFAR
Code for replicating the experiments reported on in Unexpected Benefits of Self-Modeling in Neural Systems