xionghuichen

phd student at nanjing university

Nanjing University

xionghuichen's Stars

ollama/ollama
Get up and running with Llama 3.3, Mistral, Gemma 2, and other large language models.
Language:Go106k 615 5.3k8.5k
All-Hands-AI/OpenHands
🙌 OpenHands: Code Less, Make More
Language:Python40.7k 323 2k4.5k
hiyouga/LLaMA-Factory
Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
Language:Python37.3k 219 5.6k4.6k
DIYgod/RSSHub
🧡 Everything is RSSible
Language:TypeScript34.6k 353 5.7k7.7k
ml-explore/mlx
MLX: An array framework for Apple silicon
Language:C++18.2k 148 5921k
KindXiaoming/pykan
Kolmogorov Arnold Networks
Language:Jupyter Notebook15.3k 112 4191.4k
NVIDIA/Megatron-LM
Ongoing research training transformer models at scale
Language:Python11k 168 8142.5k
idootop/mi-gpt
🏠 将小爱音箱接入 ChatGPT 和豆包，改造成你的专属语音助手。
Language:TypeScript8.5k 58 205935
heyform/heyform
Open-Source Form Builder
Language:TypeScript7.4k 27 67526
OpenRLHF/OpenRLHF
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)
Language:Python3.5k 28 381332
huawei-noah/Pretrained-Language-Model
Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.
Language:Python3k 57 201628
allenai/RL4LMs
A modular RL library to fine-tune language models to human preferences
Language:Python2.2k 24 59191
Farama-Foundation/Metaworld
Collections of robotics environments geared towards benchmarking multi-task and meta reinforcement learning
Language:Python1.3k 29 220278
google-deepmind/open_x_embodiment
Language:Jupyter Notebook963 20 7768
franciszzj/Leffa
Learning Flow Fields in Attention for Controllable Person Image Generation
Language:Python855 6 2683
Cledersonbc/tic-tac-toe-minimax
Minimax is a AI algorithm.
Language:Python434 19 6250
Sentdex/Carla-RL
Reinforcement Learning codebase for self-driving car in Carla
Language:Python369 12 2495
llava-rlhf/LLaVA-RLHF
Aligning LMMs with Factually Augmented RLHF
Language:Python335 9 3724
DigiRL-agent/digirl
Official repo for paper DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning.
Language:Python286 9 3522
zjunlp/KnowAgent
KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents
Language:Python188 7 815
Yuexuan9/Tinker
This project features an open-source small bipedal robot designed for research, education, and hobbyist experimentation.
100 4 06
mmrobotlab/DailyRobot
78 4 03
tinkoff-ai/katakomba
Data-Driven NetHack Tools: Datasets (30+) and recurrent-baselines (AWAC, BC, CQL, IQL, REM)
Language:Python68 3 03
archersama/Uni-CTR
Source code of TOIS paper "A Unified Framework for Multi-Domain CTR Prediction via Large Language Models"
Language:Python24 2 01
FanmingL/Recurrent-Offpolicy-RL
Implementation of SAC and TD3 based on various RNN and Transformer.
Language:Python15 4 01
yixiaoer/mistral-jax
JAX implementation of the Mistral 7b v0.1 model
Language:Python13 4 02
xionghuichen/policy-conditioned-model
official code of "Effective Offline Environment Reconstruction when the Dataset is Collected from Diversified Behavior Policies"
Language:Python3 1 01
LAMDA-RL/policy-conditioned-model
official code of "Effective Offline Environment Reconstruction when the Dataset is Collected from Diversified Behavior Policies"
Language:Python1 0 00
LAMDA-RL/WiseRL
PyTorch implementations for Offline Preference-Based RL (PbRL) algorithms
Language:Python1 0 0
RobertTLange/mistral-jax
JAX implementation of the Mistral model
Language:Python1 0 0

xionghuichen

xionghuichen's Stars

ollama/ollama

All-Hands-AI/OpenHands

hiyouga/LLaMA-Factory

DIYgod/RSSHub

ml-explore/mlx

KindXiaoming/pykan

NVIDIA/Megatron-LM

idootop/mi-gpt

heyform/heyform

OpenRLHF/OpenRLHF

huawei-noah/Pretrained-Language-Model

allenai/RL4LMs

Farama-Foundation/Metaworld

google-deepmind/open_x_embodiment

franciszzj/Leffa

Cledersonbc/tic-tac-toe-minimax

Sentdex/Carla-RL

llava-rlhf/LLaVA-RLHF

DigiRL-agent/digirl

zjunlp/KnowAgent

Yuexuan9/Tinker

mmrobotlab/DailyRobot

tinkoff-ai/katakomba

archersama/Uni-CTR

FanmingL/Recurrent-Offpolicy-RL

yixiaoer/mistral-jax

xionghuichen/policy-conditioned-model

LAMDA-RL/policy-conditioned-model

LAMDA-RL/WiseRL

RobertTLange/mistral-jax