hijkzzz

RLer + NLPer/2 + MLSyser/2

NVIDIA

Pinned Repositories

alpha-zero-gomoku
A Multi-threaded Implementation of AlphaZero (C++)
Language:Python373 10 3849
Awesome-LLM-Strawberry
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.
5.9k 90 9321
cuda-neural-network
Convolutional Neural Network with CUDA (MNIST 99.23%)
Language:C++182 4 1039
deep-reinforcement-learning-notes
Deep Reinforcement Learning Notes
118 4 16
mini-interpreter
A Simple Scripting Language
Language:Go79 3 05
mini-os-kernel
A mini Unix-Like OS kernel
Language:C94 4 06
noisy-mappo
Multi-agent PPO with noise (97% win rates on Hard scenarios of SMAC)
Language:Python54 3 26
pymarl2
Fine-tuned MARL algorithms on SMAC (100% win rates on most scenarios)
Language:Python639 17 41124
reinforcement-learning-wechat-jump
Reinforcement Learning for WeChat Jump
Language:Python91 5 12
OpenRLHF
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)
Language:Python3.3k 26 340305

hijkzzz's Repositories

hijkzzz/Awesome-LLM-Strawberry
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.
5.9k 90 9321
hijkzzz/pymarl2
Fine-tuned MARL algorithms on SMAC (100% win rates on most scenarios)
Language:Python639 17 41124
hijkzzz/alpha-zero-gomoku
A Multi-threaded Implementation of AlphaZero (C++)
Language:Python373 10 3849
hijkzzz/cuda-neural-network
Convolutional Neural Network with CUDA (MNIST 99.23%)
Language:C++182 4 1039
hijkzzz/deep-reinforcement-learning-notes
Deep Reinforcement Learning Notes
118 4 16
hijkzzz/mini-os-kernel
A mini Unix-Like OS kernel
Language:C94 4 06
hijkzzz/reinforcement-learning-wechat-jump
Reinforcement Learning for WeChat Jump
Language:Python91 5 12
hijkzzz/mini-interpreter
A Simple Scripting Language
Language:Go79 3 05
hijkzzz/prisma
Prisma
Language:Python71 4 03
hijkzzz/dht-crawler
A DHT Crawler based on Goroutine
Language:Go64 3 04
hijkzzz/web-server
A Web Server designed with Reactor I/O Model
Language:C++64 3 01
hijkzzz/noisy-mappo
Multi-agent PPO with noise (97% win rates on Hard scenarios of SMAC)
Language:Python54 3 26
hijkzzz/deep-learning-notes
Deep Learning Notes
50 3 01
hijkzzz/reinforcement-learning-trading-robot
Trading Robot based on LSTM-PPO
Language:Python24 6 15
hijkzzz/awesome-RLHF
A curated list of reinforcement learning with human feedback resources (continually updated)
3 0 01
hijkzzz/dotfiles
Configuration file
Language:Shell3 3 0
hijkzzz/hijkzzz.github.io
Homepage
Language:HTML3 3 0
hijkzzz/Awesome-LLM-Inference
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
2 1 01
hijkzzz/leetcode
LeetCode & LintCode
Language:C++2 3 0
hijkzzz/Awesome-LLM-Long-Context-Modeling
📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥
1 1 01
hijkzzz/2025
Language:HTML0 0
hijkzzz/hijkzzz
3 01
hijkzzz/llamafia.github.io
Language:HTML0 0
hijkzzz/mame-street-fighter-3-ai
Reinforcement Learning for Street Fighter III: 3rd Strike
Language:Python3 01
hijkzzz/NTU-Thesis-LaTeX-Template
🎓 Unofficial LaTeX templates for your graduate thesis (both master's theses and doctoral dissertations) at National Taiwan University. 國立臺灣大學碩博士學位論文 LaTeX 模板
Language:TeX2 0
hijkzzz/reinforcement-learning.pytorch
Reinforcement Learning Library
Language:Python3 01
hijkzzz/staging
iclr-blogposts.github.io/staging
Language:HTML1 0
hijkzzz/termux-jupyter
Termux init script
Language:Shell3 01