Pinned Repositories
BinaryNet
Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1
CUDA-Learn-Notes
📚150+ Tensor/CUDA Cores Kernels, ⚡️flash-attention-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS 🎉🎉).
fast-hadamard-transform
Fast Hadamard transform in CUDA, with a PyTorch interface
gpu_matmul
HolisticTraceAnalysis
A library to analyze PyTorch traces.
joint_point_based
ken012git.github.io
A beautiful, simple, clean, and responsive Jekyll theme for academics
mamba
mesh2color_voxel
This is a tool for voxelizing ply mesh with color informations.
MLDS2017_final
MLDS2017 final: batch normalization
hychiang-git's Repositories
hychiang-git/joint_point_based
hychiang-git/mesh2color_voxel
This is a tool for voxelizing ply mesh with color informations.
hychiang-git/MLDS2017_final
MLDS2017 final: batch normalization
hychiang-git/BinaryNet
Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1
hychiang-git/CUDA-Learn-Notes
📚150+ Tensor/CUDA Cores Kernels, ⚡️flash-attention-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS 🎉🎉).
hychiang-git/fast-hadamard-transform
Fast Hadamard transform in CUDA, with a PyTorch interface
hychiang-git/gpu_matmul
hychiang-git/HolisticTraceAnalysis
A library to analyze PyTorch traces.
hychiang-git/ken012git.github.io
A beautiful, simple, clean, and responsive Jekyll theme for academics
hychiang-git/mamba
hychiang-git/pscan
hychiang-git/QuaRot
Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.
hychiang-git/smoothquant
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
hychiang-git/torch-int
This repository contains integer operators on GPUs for PyTorch.
hychiang-git/xtensor-io
xtensor plugin to read and write images, audio files, numpy (compressed) npz and HDF5