KJLdefeated
Computer science student in National Yang Ming Chiao Tung University, interested in Reinforcement Learning.
NYCU
KJLdefeated's Stars
jiadong5/ECE408_FA23_UIUC
My GitHub Repo for UIUC ECE408 Applied Parallel Programming, mainly focus on CUDA programming and algorithm implementation.
microsoft/graphrag
A modular graph-based Retrieval-Augmented Generation (RAG) system
leo811121/UIUC-CS-483-Parallel-Programming
talkingwallace/ChatGPT-Paper-Reader
This repo offers a simple interface that helps you to read&summerize research papers in pdf format. You can ask some questions after reading. This interface is developed based on openai API and using GPT-3.5-turbo model.
haarnoja/sac
Soft Actor-Critic
ZiyiZhang27/tdpo
[ICML 2024] Code for the paper "Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases"
hiwonjoon/IROS2021_SORS
AI4Finance-Foundation/RLSolver
Solvers for NP-hard and NP-complete problems with an emphasis on high-performance GPU computing.
openai/safety-gym
Tools for accelerating safe exploration research.
sfujim/TD3
Author's PyTorch implementation of TD3 for OpenAI gym tasks
alexrame/rewardedsoups
Rewarded soups official implementation
goldmansachs/gs-quant
Python toolkit for quantitative finance
THUDM/ImageReward
[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation
kristery/Awesome-Imitation-Learning
A curated list of awesome imitation learning resources and publications
Kaixhin/imitation-learning
Imitation learning algorithms
Shentao-YANG/Dense_Reward_T2I
Source code for "A Dense Reward View on Aligning Text-to-Image Diffusion with Preference" (ICML'24).
NVlabs/DoRA
[ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation
clu0/unet.cu
UNet diffusion model in pure CUDA
artidoro/qlora
QLoRA: Efficient Finetuning of Quantized LLMs
modelscope/DiffSynth-Studio
Enjoy the magic of Diffusion models!
x35f/alpha2
pseudocode and algorithms for the paper "Alpha$^2$: Discovering Logical Formulaic Alphas using Deep Reinforcement Learning"
karpathy/LLM101n
LLM101n: Let's build a Storyteller
lucidrains/multimodal-dit-pytorch
Implementation of a multimodal diffusion transformer in Pytorch
SalesforceAIResearch/DiffusionDPO
Code for "Diffusion Model Alignment Using Direct Preference Optimization"
RLHFlow/Online-RLHF
A recipe for online RLHF.
digital-nomad-cheng/ECE408_Applied_Parallel_Programming
CUDA solutions for the lab assignments in the UIUC-ECE408 Applied Parallel Programming course.
mistralai/mistral-finetune
togethercomputer/MoA
Together Mixture-Of-Agents (MoA) – 65.1% on AlpacaEval with OSS models
ridgerchu/matmulfreellm
Implementation for MatMul-free LM.
hibana2077/wisper_ui