dtch1997
Mechanistic interpretability researcher. Interested in interpreting multimodal foundation models
dtch1997's Stars
ethanluoyc/e2c-pytorch
E2C implementation in PyTorch
ethanluoyc/sympais
Symbolic Parallel Adaptive Importance Sampling for Probabilistic Program Analysis in JAX
ethanluoyc/compile-jax
CompILE implementation in JAX
microsoft/LoRA
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
mlb2251/stitch
A scalable abstraction learning library
PKU-Alignment/omnisafe
JMLR: OmniSafe is an infrastructural framework for accelerating SafeRL research.
cagatayyildiz/oderl
Experiment code for "Continuous-Time Model-Based Reinforcement Learning"
pypa/hatch
Modern, extensible Python project management
google-deepmind/xmanager
A platform for managing machine learning experiments
r-three/git-theta
git extension for {collaborative, communal, continual} model development
Berk-Tosun/cbf-cartpole
Various Control Barrier Functions realized on cartpole.
ethanluoyc/lxm3
LXM3: XManager launch backend for HPC clusters
google-deepmind/dm_control
Google DeepMind's software stack for physics-based simulation and Reinforcement Learning environments, using MuJoCo.
google-research/rlds
ethanluoyc/optimal_transport_reward
utiasDSL/safe-control-gym
PyBullet CartPole and Quadrotor environments—with CasADi symbolic a priori dynamics—for learning-based control and RL
utiasDSL/gym-pybullet-drones
PyBullet Gymnasium environments for single and multi-agent reinforcement learning of quadcopter control
ucl-dark/paired
PAIRED in PyTorch 🔥
tinkoff-ai/katakomba
Data-Driven NetHack Tools: Datasets (30+) and recurrent-baselines (AWAC, BC, CQL, IQL, REM)
tinkoff-ai/CORL
High-quality single-file implementations of SOTA Offline and Offline-to-Online RL algorithms: AWAC, BC, CQL, DT, EDAC, IQL, SAC-N, TD3+BC, LB-SAC, SPOT, Cal-QL, ReBRAC
maitrix-org/llm-reasoners
A library for advanced large language model reasoning
brownirl/rlang
A Declarative Language for Expressing Partial World Knowledge to Reinforcement Learning Agents
ml-jku/rudder
RUDDER: Return Decomposition for Delayed Rewards
chauff/paper-note-filler
Obsidian plugin to automatically create a note from arXiv.org, acl anthology and semantic scholar.
akelleh/causality
Tools for causal analysis
fiddler-labs/fiddler-auditor
Fiddler Auditor is a tool to evaluate language models.
Eclectic-Sheep/sheeprl
Distributed Reinforcement Learning accelerated by Lightning Fabric
google-deepmind/tracr
VowpalWabbit/vowpal_wabbit
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.
VowpalWabbit/reinforcement_learning
Interaction-side integration library for Reinforcement Learning loops: Predict, Log, [Learn,] Update