vwxyzjn

RLHF @huggingface, CS Ph.D. from Drexel University in RL.

@huggingfacePhiladelphia, PA

vwxyzjn's Stars

xai-org/grok-1
Grok open release
Language:Python49k 553 1978.3k
karpathy/llm.c
LLM training in simple, raw C/CUDA
Language:Cuda20.5k 207 1152.2k
wandb/openui
OpenUI let's you describe UI using your imagination, then see it rendered live.
Language:HTML16.2k 105 1191.4k
ml-explore/mlx
MLX: An array framework for Apple silicon
Language:C++15.1k 137 437859
astral-sh/uv
An extremely fast Python package installer and resolver, written in Rust.
Language:Rust12.3k 28 1.6k344
state-spaces/mamba
Mamba SSM architecture
Language:Python10.9k 99 340873
karpathy/minbpe
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
Language:Python8.5k 79 34766
BartoszJarocki/cv
Print-friendly, minimalist CV page
Language:TypeScript8.5k 23 28893
PWhiddy/PokemonRedExperiments
Playing Pokemon Red with Reinforcement Learning
Language:Jupyter Notebook6.6k 65 106584
hrvach/deskhop
Fast Desktop Switching Device
Language:C5.9k 48 77159
pytorch-labs/gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
Language:Python5.3k 61 87475
allenai/OLMo
Modeling, training, eval, and inference code for OLMo
Language:Python4.1k 41 157385
eric-mitchell/direct-preference-optimization
Reference implementation for DPO (Direct Preference Optimization)
Language:Python1.8k 18 77135
openai/summarize-from-feedback
Code for "Learning to summarize from human feedback"
Language:Python959 148 21141
huggingface/nanotron
Minimalistic large language model 3D-parallelism training
Language:Python897 41 5877
cnpryer/huak
My experimental Python package manager.
Language:Rust610 7 25037
lhao499/RingAttention
Transformers with Arbitrarily Large Context
Language:Python552 5 1342
huggingface/text-clustering
Easily embed, cluster and semantically label text datasets
Language:Python367 35 523
abacaj/code-eval
Run evaluation on LLMs using human-eval benchmark
Language:Python356 11 734
instadeepai/flashbax
⚡ Flashbax: Accelerated Replay Buffers in JAX
Language:Python168 13 66
liuzuxin/OSRL
🤖 Elegant implementations of offline safe RL algorithms in PyTorch
Language:Python145 4 1812
MatX-inc/seqax
seqax = sequence modeling + JAX
Language:Python119 6 16
RL4VLM/RL4VLM
Official Repo for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning
Language:Jupyter Notebook1179
foundation-model-stack/fms-fsdp
🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash attention v2.
Language:Python9617
SpellcraftAI/oaib
Use the OpenAI Batch tool to make async batch requests to the OpenAI API.
Language:Python84 4 42
DramaCow/jaxued
Language:Python481
instadeepai/sebulba
🪐 The Sebulba architecture to scale reinforcement learning on Cloud TPUs in JAX
Language:Python43 5 14
emilianbold/PDFwriter
An OSX print to pdf-file printer driver
Language:Objective-C23 2 01
cogment/cogment-lab
A toolkit for practical Human-AI cooperation research
Language:Python112
google/putting-dune
Language:Python7 8 01

vwxyzjn

vwxyzjn's Stars

xai-org/grok-1

karpathy/llm.c

wandb/openui

ml-explore/mlx

astral-sh/uv

state-spaces/mamba

karpathy/minbpe

BartoszJarocki/cv

PWhiddy/PokemonRedExperiments

hrvach/deskhop

pytorch-labs/gpt-fast

allenai/OLMo

eric-mitchell/direct-preference-optimization

openai/summarize-from-feedback

huggingface/nanotron

cnpryer/huak

lhao499/RingAttention

huggingface/text-clustering

abacaj/code-eval

instadeepai/flashbax

liuzuxin/OSRL

MatX-inc/seqax

RL4VLM/RL4VLM

foundation-model-stack/fms-fsdp

SpellcraftAI/oaib

DramaCow/jaxued

instadeepai/sebulba

emilianbold/PDFwriter

cogment/cogment-lab

google/putting-dune