holarissun

PhD in Reinforcement Learning, LLM Alignment, RLHF

University of Cambridge

holarissun's Stars

floodsung/LLM-with-RL-papers
A collection of LLM with RL papers
2289
openai/tiktoken
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
Language:Python12.3k831
wordweb/Tiger-qq-bot
基于langchain-chatglm-and-tigerbot+mirai 实现的一个qq群本地知识库问答机器人，可以通过直接将知识库文件提交到qq群的方式来上传知识库，也可以通过指令来开关（删除）知识库。从而得到一个基于qq的便携式本地知识库问答机器人。
Language:Python252
wordweb/langchain-ChatGLM-and-TigerBot
从langchain-ChatGLM基础上修改的一个可以加载TigerBot模型的基于本地知识库的问答应用，目标期望建立一套对中文场景与开源模型支持友好、可离线运行的知识库问答解决方案。
Language:Python10519
liaokongVFX/LangChain-Chinese-Getting-Started-Guide
LangChain 的中文入门教程
7.4k595
langchain-ai/langchain
🦜🔗 Build context-aware reasoning applications
Language:Jupyter Notebook94.2k15.2k
Alvin9999/new-pac
翻墙-科学上网、自由上网、免费科学上网、免费翻墙、油管youtube、fanqiang、软件、VPN、一键翻墙浏览器，vps一键搭建翻墙服务器脚本/教程，免费shadowsocks/ss/ssr/v2ray/goflyway账号/节点，翻墙梯子，电脑、手机、iOS、安卓、windows、Mac、Linux、路由器翻墙、科学上网、youtube视频下载、美区apple id共享账号
55.4k9.4k
vanderschaarlab/clairvoyance
Clairvoyance: a Unified, End-to-End AutoML Pipeline for Medical Time Series
Language:Jupyter Notebook12429
jxx123/simglucose
A Type-1 Diabetes simulator implemented in Python for Reinforcement Learning purpose
Language:Python241113
dickreuter/neuron_poker
Texas holdem OpenAi gym poker environment with reinforcement learning based on keras-rl. Includes virtual rendering and montecarlo for equity calculation.
Language:Python624171
denisyarats/exorl
ExORL: Exploratory Data for Offline Reinforcement Learning
Language:Python1029
lizhuo-1994/NECSA
Official implementation of Neural Episodic Control with State Abstraction
Language:Python121
rll-research/url_benchmark
Language:Python33151
cloneofsimo/lora
Using Low-rank adaptation to quickly fine-tune diffusion models.
Language:Jupyter Notebook7k481
tinkoff-ai/CORL
High-quality single-file implementations of SOTA Offline and Offline-to-Online RL algorithms: AWAC, BC, CQL, DT, EDAC, IQL, SAC-N, TD3+BC, LB-SAC, SPOT, Cal-QL, ReBRAC
Language:Python1.1k131
holarissun/DOMIAS
Language:Python46
vanderschaarlab/synthetic-data-lab
A repository containing the materials required to complete the "AAAI Lab for Innovative Uses of Synthetic Data". This includes tutorials on how to use the library "Synthcity" for improving the fairness and privacy of a dataset as well as for augmenting a small dataset using some other similar datasets.
Language:Jupyter Notebook122
Trinkle23897/tuixue.online-visa
https://tuixue.online/visa/ A Real-time Display of U.S. Visa Appointment Status Website 预约美帝签证各个签证处最早时间的爬虫
Language:Python811124
alihanhyk/invconban
Inverse Contextual Bandits: Learning How Behavior Evolves over Time
Language:Python32
HITFRobot/happy-spiders
🔧 🔩 🔨 收集整理了爬虫相关的工具、模拟登陆技术、代理IP、scrapy模板代码等内容。
Language:Python26764
AminHP/gym-mtsim
A general-purpose, flexible, and easy-to-use simulator alongside an OpenAI Gym trading environment for MetaTrader 5 trading platform (Approved by OpenAI Gym)
Language:Python432110
sail-sg/envpool
C++-based high-performance parallel environment execution engine (vectorized env) for general RL environments.
Language:C++1.1k100
banditml/offline-policy-evaluation
Implementations and examples of common offline policy evaluation methods in Python.
Language:Python21924
holarissun/RewardShifting
Code for NeurIPS 2022 paper Exploiting Reward Shifting in Value-Based Deep RL
Language:Python262
google-research/deep_ope
Language:Jupyter Notebook859
clvoloshin/COBS
OPE Tools based on Empirical Study of Off Policy Policy Estimation paper.
Language:Python6114
st-tech/zr-obp
Open Bandit Pipeline: a python library for bandit algorithms and off-policy evaluation
Language:Python64387
holarissun/MOPA
Language:Jupyter Notebook2
academicpages/academicpages.github.io
Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes
Language:JavaScript12.3k43.5k
metadriverse/metadrive
MetaDrive: Open-source driving simulator
Language:Python775110

holarissun

holarissun's Stars

floodsung/LLM-with-RL-papers

openai/tiktoken

wordweb/Tiger-qq-bot

wordweb/langchain-ChatGLM-and-TigerBot

liaokongVFX/LangChain-Chinese-Getting-Started-Guide

langchain-ai/langchain

Alvin9999/new-pac

vanderschaarlab/clairvoyance

jxx123/simglucose

dickreuter/neuron_poker

denisyarats/exorl

lizhuo-1994/NECSA

rll-research/url_benchmark

cloneofsimo/lora

tinkoff-ai/CORL

holarissun/DOMIAS

vanderschaarlab/synthetic-data-lab

Trinkle23897/tuixue.online-visa

alihanhyk/invconban

HITFRobot/happy-spiders

AminHP/gym-mtsim

sail-sg/envpool

banditml/offline-policy-evaluation

holarissun/RewardShifting

google-research/deep_ope

clvoloshin/COBS

st-tech/zr-obp

holarissun/MOPA

academicpages/academicpages.github.io

metadriverse/metadrive