Jaewoopudding's Stars
meta-llama/codellama
Inference code for CodeLlama models
alshedivat/al-folio
A beautiful, simple, clean, and responsive Jekyll theme for academics
huggingface/trl
Train transformer language models with reinforcement learning.
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
opendilab/awesome-RLHF
A curated list of reinforcement learning with human feedback resources (continually updated)
takuseno/d3rlpy
An offline deep reinforcement learning library
juncongmoo/chatllama
ChatLLaMA 📢 Open source implementation for LLaMA-based ChatGPT runnable in a single GPU. 15x faster training process than ChatGPT
gorisanson/pikachu-volleyball
Pikachu Volleyball implemented into JavaScript by reverse engineering the original game
ej0cl6/deep-active-learning
Deep Active Learning
jxzhangjhu/Awesome-LLM-Uncertainty-Reliability-Robustness
Awesome-LLM-Robustness: a curated list of Uncertainty, Reliability and Robustness in Large Language Models
CleanDiffuserTeam/CleanDiffuser
CleanDiffuser: An Easy-to-use Modularized Library for Diffusion Models in Decision Making
jannerm/ddpo
Code for the paper "Training Diffusion Models with Reinforcement Learning"
Samimust/predictive-maintenance
Data Wrangling, EDA, Feature Engineering, Model Selection, Regression, Binary and Multi-class Classification (Python, scikit-learn)
facebookresearch/online-dt
Online Decision Transformer
IBM/rl-testbed-for-energyplus
Reinforcement Learning Testbed for Power Consumption Optimization using EnergyPlus
prosysscience/JSSEnv
An OpenAi Gym environment for the Job Shop Scheduling problem.
EmptyJackson/policy-guided-diffusion
Official implementation of the RLC 2024 paper "Policy-Guided Diffusion"
IBM/iot-predictive-analytics
Method for Predicting failures in Equipment using Sensor data. Sensors mounted on devices like IoT devices, Automated manufacturing like Robot arms, Process monitoring and Control equipment etc., collect and transmit data on a continuous basis which is Time stamped.
conglu1997/v-d4rl
Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations
nttcslab/msm-mae
Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representations
Dragon-Zhuang/BPPO
Author's Pytorch implementation of ICLR2023 paper Behavior Proximal Policy Optimization (BPPO).
Junyoungpark/Pytorch-AWAC
A PyTorch implementation of Advantage weighted Actor-Critic (AWAC)
qingshi9974/PPO-pytorch-Mujoco
Implement PPO algorithm on mujoco environment,such as Ant-v2, Humanoid-v2, Hopper-v2, Halfcheeth-v2.
datvodinh/ppo-transformer
A Reinforcement Learning Project using PPO + Transformer
reinforcement-learning-kr/rl-montezuma
The state-of-art deep rl algorithms for Montezuma's revenge
GRAAL-Research/OfflineRLReadingGroup
Offline Reinforcement Learning Reading Group
dbsxodud-11/ls_gfn
Official Code for Local Search GFlowNets (ICLR 2024 Spotlight)
Jaewoopudding/GTA
Official codebase for GTA: Generative Trajectory Augmentation with Guidance for Offline Reinforcement Learning.
beanie00/Decision-ConvFormer
[ICLR 2024 Spotlight] Code for the paper "Decision ConvFormer: Local Filtering in MetaFormer is Sufficient for Decision Making"
umkiyoung/EXplorativeDT
RESEARCH