TaciturnMute

reinforcement learning

TaciturnMute's Stars

hpcaitech/ColossalAI
Making large AI models cheaper, faster and more accessible
Language:Python38.9k 386 1.7k4.3k
meta-llama/llama3
The official Meta Llama 3 GitHub site
Language:Python27.3k 228 2653.1k
joonspk-research/generative_agents
Generative Agents: Interactive Simulacra of Human Behavior
17.8k 142 1312.3k
bulletphysics/bullet3
Bullet Physics SDK: real-time collision detection and multi-physics simulation for VR, games, visual effects, robotics, machine learning etc.
Language:C++12.7k 410 2k2.9k
microsoft/LoRA
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
Language:Python10.9k 70 107691
RUCAIBox/LLMSurvey
The official GitHub page for the survey paper "A Survey of Large Language Models".
Language:Python10.5k 158 64824
nlpxucan/WizardLM
LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath
Language:Python9.3k 113 190720
DLR-RM/stable-baselines3
PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
Language:Python9.3k 65 1.5k1.7k
kingoflolz/mesh-transformer-jax
Model parallel transformers in JAX and Haiku
Language:Python6.3k 112 206892
codertimo/BERT-pytorch
Google AI 2018 BERT pytorch implementation
Language:Python6.2k 126 871.3k
google-deepmind/alphageometry
Language:Python4.2k 53 125470
baichuan-inc/Baichuan2
A series of large language models developed by Baichuan Intelligent Technology
Language:Python4.1k 41 395298
suragnair/alpha-zero-general
A clean implementation based on AlphaZero for any game in any framework + tutorial + Othello/Gobang/TicTacToe/Connect4 and more
Language:Jupyter Notebook3.9k 111 1801k
AI4Finance-Foundation/ElegantRL
Massively Parallel Deep Reinforcement Learning. 🔥
Language:Python3.8k 50 261852
higgsfield-ai/higgsfield
Fault-tolerant, highly scalable GPU orchestration, and a machine learning framework designed for training models with billions to trillions of parameters
Language:Jupyter Notebook3.3k 76 1554
higgsfield/RL-Adventure
Pytorch Implementation of DQN / DDQN / Prioritized replay/ noisy networks/ distributional values/ Rainbow/ hierarchical RL
Language:Jupyter Notebook3k 72 22589
DLR-RM/rl-baselines3-zoo
A training framework for Stable Baselines3 reinforcement learning agents, with hyperparameter optimization and pre-trained agents included.
Language:Python2.1k 23 252516
MetaGLM/FinGLM
FinGLM: 致力于构建一个开放的、公益的、持久的金融大模型项目，利用开源开放来促进「AI+金融」。
Language:HTML1.8k 29 30271
sfujim/TD3
Author's PyTorch implementation of TD3 for OpenAI gym tasks
Language:Python1.7k 19 41438
haarnoja/sac
Soft Actor-Critic
Language:Python1k 29 27235
lucidrains/mixture-of-experts
A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models
Language:Python649 6 1149
araffin/rl-tutorial-jnrr19
Stable-Baselines tutorial for Journées Nationales de la Recherche en Robotique 2019
Language:Jupyter Notebook616 11 13116
higgsfield/np-hard-deep-reinforcement-learning
pytorch neural combinatorial optimization
Language:Jupyter Notebook374 19 685
DongChen06/MARL_CAVs
MARL for Autonomous Vehicles
Language:Python256 6 5648
lipengyuer/DataScience
Language:Python137 3 360
polixir/NeoRL
Python interface for accessing the near real-world offline reinforcement learning (NeoRL) benchmark datasets
Language:Python109 5 1212
TianHongZXY/CoRe
[ACL 2023] Solving Math Word Problems via Cooperative Reasoning induced Language Models
Language:Python43 1 56
ghdrl95/stock_experiment_multimodal
'A Deep Multimodal Reinforcement Learning System Combined with CNN and LSTM for Stock Trading' 실험 소스
Language:Python7 1 10
yhc582825016/NLP4math
自然语言处理和强化学习相关的资料
3 1 00
bofenghuang/stanford_alpaca
Code and documentation to train Stanford's Alpaca models, and generate the data.
Language:Python1 1 00

TaciturnMute

TaciturnMute's Stars

hpcaitech/ColossalAI

meta-llama/llama3

joonspk-research/generative_agents

bulletphysics/bullet3

microsoft/LoRA

RUCAIBox/LLMSurvey

nlpxucan/WizardLM

DLR-RM/stable-baselines3

kingoflolz/mesh-transformer-jax

codertimo/BERT-pytorch

google-deepmind/alphageometry

baichuan-inc/Baichuan2

suragnair/alpha-zero-general

AI4Finance-Foundation/ElegantRL

higgsfield-ai/higgsfield

higgsfield/RL-Adventure

DLR-RM/rl-baselines3-zoo

MetaGLM/FinGLM

sfujim/TD3

haarnoja/sac

lucidrains/mixture-of-experts

araffin/rl-tutorial-jnrr19

higgsfield/np-hard-deep-reinforcement-learning

DongChen06/MARL_CAVs

lipengyuer/DataScience

polixir/NeoRL

TianHongZXY/CoRe

ghdrl95/stock_experiment_multimodal

yhc582825016/NLP4math

bofenghuang/stanford_alpaca