Pinned Repositories
VPTQ
VPTQ, A Flexible and Extreme low-bit quantization algorithm
3DFNoC
3D NoC Emulation Model on a Single FPGA
abliterator
Simple Python library/structure to ablate features in LLMs which are supported by TransformerLens
Accel-NASBench
Accel-NASBench: A Surrogate Benchmark for Accelerator-Aware NAS
AdderNet
Code for paper " AdderNet: Do We Really Need Multiplications in Deep Learning?"
AI-Youtube-Shorts-Generator
A python tool that uses GPT-4, FFmpeg, and OpenCV to automatically analyze videos, extract the most interesting sections, and crop them for an improved viewing experience.
Ampere_Persistent_Cache_Eval
AX6S-unlock
tvm-models-baseline
YangWang92.github.io
YangWang92's Repositories
YangWang92/ao
PyTorch native quantization and sparsity for training and inference
YangWang92/FractalTensor
YangWang92/Megatron-LM-rocm-fork
Ongoing research training transformer models at scale
YangWang92/CodeIO
CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction
YangWang92/commercial_thermal_map_dataset
YangWang92/DeepSeek-V3
YangWang92/EASIER
Efficient Auto-scalable Scientific Infrastructure for Engineers and Researchers
YangWang92/flute
Fast Matrix Multiplications for Lookup Table-Quantized LLMs
YangWang92/grouped_gemm
PyTorch bindings for CUTLASS grouped GEMM for rocm
YangWang92/large_concept_model
Large Concept Models: Language modeling in a sentence representation space
YangWang92/Liger-Kernel
Efficient Triton Kernels for LLM Training
YangWang92/llama3_interpretability_sae
A complete end-to-end pipeline for LLM interpretability with sparse autoencoders (SAEs) using Llama 3.2, written in pure PyTorch and fully reproducible.
YangWang92/Marco-o1
An Open Large Reasoning Model for Real-World Solutions
YangWang92/mfu_calculation
A simple calculation for LLM MFU.
YangWang92/MiniMax-01
YangWang92/ml-mobileclip
This repository contains the official implementation of the research paper, "MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training" CVPR 2024
YangWang92/occamy
A high-efficiency system-on-chip for floating-point compute workloads.
YangWang92/open-instruct
YangWang92/open-r1
Fully open reproduction of DeepSeek-R1
YangWang92/QLLM
A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ, and export to onnx/onnx-runtime easily.
YangWang92/quip-sharp
YangWang92/ReST-MCTS
ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)
YangWang92/sglang
SGLang is a fast serving framework for large language models and vision language models.
YangWang92/simpleRL-reason
This is a replicate of DeepSeek-R1-Zero and DeepSeek-R1 training on small models with limited data
YangWang92/TileFusion
YangWang92/TinyZero
YangWang92/verl
veRL: Volcano Engine Reinforcement Learning for LLM
YangWang92/VITA.dev
✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
YangWang92/VPTQ
VPTQ, A Flexible and Extreme low-bit quantization algorithm
YangWang92/wandb
The AI developer platform. Use Weights & Biases to train and fine-tune models, and manage models from experimentation to production.