dtch1997

Mechanistic interpretability researcher. Interested in interpreting multimodal foundation models

dtch1997's Stars

ethanluoyc/e2c-pytorch
E2C implementation in PyTorch
Language:Python439
ethanluoyc/sympais
Symbolic Parallel Adaptive Importance Sampling for Probabilistic Program Analysis in JAX
Language:Python61
ethanluoyc/compile-jax
CompILE implementation in JAX
Language:Python2
microsoft/LoRA
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
Language:Python10.4k668
mlb2251/stitch
A scalable abstraction learning library
Language:Rust718
PKU-Alignment/omnisafe
JMLR: OmniSafe is an infrastructural framework for accelerating SafeRL research.
Language:Python912130
cagatayyildiz/oderl
Experiment code for "Continuous-Time Model-Based Reinforcement Learning"
Language:Python4613
pypa/hatch
Modern, extensible Python project management
Language:Python5.9k299
google-deepmind/xmanager
A platform for managing machine learning experiments
Language:Python81545
r-three/git-theta
git extension for {collaborative, communal, continual} model development
Language:Python2039
Berk-Tosun/cbf-cartpole
Various Control Barrier Functions realized on cartpole.
Language:Python214
ethanluoyc/lxm3
LXM3: XManager launch backend for HPC clusters
Language:Python91
google-deepmind/dm_control
Google DeepMind's software stack for physics-based simulation and Reinforcement Learning environments, using MuJoCo.
Language:Python3.7k665
google-research/rlds
Language:Jupyter Notebook27520
ethanluoyc/optimal_transport_reward
Language:Python134
utiasDSL/safe-control-gym
PyBullet CartPole and Quadrotor environments—with CasADi symbolic a priori dynamics—for learning-based control and RL
Language:Python592124
utiasDSL/gym-pybullet-drones
PyBullet Gymnasium environments for single and multi-agent reinforcement learning of quadcopter control
Language:Python1.2k350
ucl-dark/paired
PAIRED in PyTorch 🔥
Language:Python5621
tinkoff-ai/katakomba
Data-Driven NetHack Tools: Datasets (30+) and recurrent-baselines (AWAC, BC, CQL, IQL, REM)
Language:Python643
tinkoff-ai/CORL
High-quality single-file implementations of SOTA Offline and Offline-to-Online RL algorithms: AWAC, BC, CQL, DT, EDAC, IQL, SAC-N, TD3+BC, LB-SAC, SPOT, Cal-QL, ReBRAC
Language:Python1.1k124
maitrix-org/llm-reasoners
A library for advanced large language model reasoning
Language:Python1.2k98
brownirl/rlang
A Declarative Language for Expressing Partial World Knowledge to Reinforcement Learning Agents
Language:HTML14
ml-jku/rudder
RUDDER: Return Decomposition for Delayed Rewards
467
chauff/paper-note-filler
Obsidian plugin to automatically create a note from arXiv.org, acl anthology and semantic scholar.
Language:TypeScript305
akelleh/causality
Tools for causal analysis
Language:Python1.1k128
fiddler-labs/fiddler-auditor
Fiddler Auditor is a tool to evaluate language models.
Language:Python16520
Eclectic-Sheep/sheeprl
Distributed Reinforcement Learning accelerated by Lightning Fabric
Language:Python30331
google-deepmind/tracr
Language:Python49940
VowpalWabbit/vowpal_wabbit
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.
Language:C++8.5k1.9k
VowpalWabbit/reinforcement_learning
Interaction-side integration library for Reinforcement Learning loops: Predict, Log, [Learn,] Update
Language:C++7440

dtch1997

dtch1997's Stars

ethanluoyc/e2c-pytorch

ethanluoyc/sympais

ethanluoyc/compile-jax

microsoft/LoRA

mlb2251/stitch

PKU-Alignment/omnisafe

cagatayyildiz/oderl

pypa/hatch

google-deepmind/xmanager

r-three/git-theta

Berk-Tosun/cbf-cartpole

ethanluoyc/lxm3

google-deepmind/dm_control

google-research/rlds

ethanluoyc/optimal_transport_reward

utiasDSL/safe-control-gym

utiasDSL/gym-pybullet-drones

ucl-dark/paired

tinkoff-ai/katakomba

tinkoff-ai/CORL

maitrix-org/llm-reasoners

brownirl/rlang

ml-jku/rudder

chauff/paper-note-filler

akelleh/causality

fiddler-labs/fiddler-auditor

Eclectic-Sheep/sheeprl

google-deepmind/tracr

VowpalWabbit/vowpal_wabbit

VowpalWabbit/reinforcement_learning