LAKan233's Stars
hiyouga/LLaMA-Factory
Efficiently Fine-Tune 100+ LLMs in WebUI (ACL 2024)
bbfamily/abu
阿布量化交易系统(股票,期权,期货,比特币,机器学习) 基于python的开源量化交易,量化投资架构
AI4Finance-Foundation/FinRL
FinRL: Financial Reinforcement Learning. 🔥
eriklindernoren/Keras-GAN
Keras implementations of Generative Adversarial Networks.
vwxyzjn/cleanrl
High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
AI4Finance-Foundation/ElegantRL
Massively Parallel Deep Reinforcement Learning. 🔥
opendilab/awesome-RLHF
A curated list of reinforcement learning with human feedback resources (continually updated)
shibing624/MedicalGPT
MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型,实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO。
plotly/dash-sample-apps
Open-source demos hosted on Dash Gallery
allenai/RL4LMs
A modular RL library to fine-tune language models to human preferences
letianzj/QuantResearch
Quantitative analysis, strategies and backtests
opendilab/PPOxFamily
PPO x Family DRL Tutorial Course(决策智能入门级公开课:8节课帮你盘清算法理论,理顺代码逻辑,玩转决策AI应用实践 )
openai/prm800k
800,000 step-level correctness labels on LLM solutions to MATH problems
zonechen1994/CV_Interview
I hope this repo can help you a lot!
tensorflow/gan
Tooling for GANs in TensorFlow
PyPatel/Options-Trading-Strategies-in-Python
Developing Options Trading Strategies using Technical Indicators and Quantitative Methods
shariqiqbal2810/MAAC
Code for "Actor-Attention-Critic for Multi-Agent Reinforcement Learning" ICML 2019
MorvanZhou/pytorch-A3C
Simple A3C implementation with pytorch + multiprocessing
xuehy/pytorch-maddpg
A pytorch implementation of MADDPG (multi-agent deep deterministic policy gradient)
tinyzqh/light_mappo
Lightweight version of MAPPO to help you quickly migrate to your local environment.
online-books/moyu
🐟 在线摸鱼减压,今天你摸鱼了吗?
poloclub/dodrio
Exploring attention weights in transformer-based models with linguistic knowledge.
SunnyGJing/t5-pegasus-chinese
基于GOOGLE T5中文生成式模型的摘要生成/指代消解,支持batch批量生成,多进程
SCHENLIU/longformer-chinese
chinese version of longformer
CMACH508/DeepTrader
Improbable-AI/eipo
Official codebase for Redeeming Intrinsic Rewards via Constrained Policy Optimization
SparkJiao/llama-pipeline-parallel
A prototype repo for hybrid training of pipeline parallel and distributed data parallel with comments on core code snippets. Feel free to copy code and launch discussions about the problems you have encoured.
Y-B-Class-Projects/Human-Fall-Detection
Human Falling Detection
YukiYasuda2718/rl-bsmodel-with-costs
Option hedging strategies are investigated using two reinforcement learning algorithms: deep Q network and deep deterministic policy gradient.
christopher-hma/STOCKS_TRADING_RL