wKIDw

wKIDw's Stars

facebookresearch/drqv2
DrQ-v2: Improved Data-Augmented Reinforcement Learning
Language:Python36087
openai/procgen
Procgen Benchmark: Procedurally-Generated Game-Like Gym-Environments
Language:C++1k211
facebookresearch/minihack
MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research
Language:Python48459
dunnolab/awesome-in-context-rl
Awesome In-Context RL: A curated list of In-Context Reinforcement Learning
967
withinmiaov/A-Survey-on-Mixture-of-Experts
1578
pjlab-sys4nlp/llama-moe
⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)
Language:Python88846
files-community/Files
A modern file manager that helps users organize their files and folders.
Language:C#34.9k2.2k
schrodingercatss/tuning_playbook_zh_cn
一本系统地教你将深度学习模型的性能最大化的战术手册。
2.6k234
labmlai/annotated_deep_learning_paper_implementations
🧑‍🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠
Language:Python56.9k5.8k
OpenSparseLLMs/CLIP-MoE
CLIP-MoE: Mixture of Experts for CLIP
Language:Python19
corl-team/ad-eps
Official Implementation for "In-Context Reinforcement Learning from Noise Distillation"
Language:Python271
xu-ye/PSBL-MetaRL
Language:Python4
modelscope/ms-swift
Use PEFT or Full-parameter to finetune 400+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)
Language:Python4.5k395
jon--lee/decision-pretrained-transformer
Implemention of the Decision-Pretrained Transformer (DPT) from the paper Supervised Pretraining Can Learn In-Context Reinforcement Learning.
Language:Python558
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
Language:Python14.5k1.4k
UT-Austin-RPL/amago
a simple and scalable agent for training adaptive policies with sequence-based RL
Language:Python935
EcoPasteHub/EcoPaste
🎉跨平台的剪贴板管理工具 | Cross-platform clipboard management tool
Language:TypeScript3.3k150
lucidrains/mixture-of-experts
A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models
Language:Python64949
microsoft/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Language:Python35.7k4.1k
facebookresearch/DiT
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
Language:Python6.5k578
Ekoda/SoftMoE
Soft Mixture of Experts Vision Transformer, addressing MoE limitations as highlighted by Puigcerver et al., 2023.
Language:Python12
reinforcement-learning-kr/pg_travel
Policy Gradient algorithms (REINFORCE, NPG, TRPO, PPO)
Language:Python36876
Khrylx/PyTorch-RL
PyTorch implementation of Deep Reinforcement Learning: Policy Gradient methods (TRPO, PPO, A2C) and Generative Adversarial Imitation Learning (GAIL). Fast Fisher vector product TRPO.
Language:Python1.1k189
NJU-RL/ACORM
Language:Python254
kzl/decision-transformer
Official codebase for Decision Transformer: Reinforcement Learning via Sequence Modeling.
Language:Python2.4k448
2019ChenGong/RL-Paper-notes
29829
jannerm/diffuser
Code for the paper "Planning with Diffusion for Flexible Behavior Synthesis"
Language:Python902143
opendilab/awesome-diffusion-model-in-rl
A curated list of Diffusion Model in RL resources (continually updated)
86646
TJU-DRL-LAB/self-supervised-rl
Language:Python354
chathub-dev/chathub
All-in-one chatbot client
Language:TypeScript10.1k1k