rohit-sonker's Stars
OpenRL-Lab/DGPO
DGPO: Discovering Multiple Strategies with Diversity-Guided Policy Optimization
WentseChen/Soft-QMIX
Soft-QMIX: Integrating Maximum Entropy For Monotonic Value Function Factorization
tensorzero/tensorzero
TensorZero creates a feedback loop for optimizing LLM applications — turning production data into smarter, faster, and cheaper models.