hongsunjang's Stars
karpathy/LLM101n
LLM101n: Let's build a Storyteller
meta-llama/llama3
The official Meta Llama 3 GitHub site
microsoft/graphrag
A modular graph-based Retrieval-Augmented Generation (RAG) system
microsoft/Graphormer
Graphormer is a general-purpose deep learning backbone for molecular modeling.
The-OpenROAD-Project/OpenROAD
OpenROAD's unified application implementing an RTL-to-GDS Flow. Documentation at https://openroad.readthedocs.io/en/latest/
sustcsonglin/flash-linear-attention
Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton
yyyujintang/Awesome-Mamba-Papers
Awesome Papers related to Mamba.
clu0/unet.cu
UNet diffusion model in pure CUDA
enfiskutensykkel/ssd-gpu-dma
Build userspace NVMe drivers and storage applications with CUDA support
NVlabs/Minitron
A family of compressed models obtained via pruning and knowledge distillation
Xilinx/DPU-PYNQ
DPU on PYNQ
snu-csl/nvmevirt
NVMeVirt: A Versatile Software-defined Virtual NVMe Device
Glaciohound/LM-Infinite
Implementation of NAACL 2024 Outstanding Paper "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"
Kashu7100/pytorch-armv7l
PyTorch 1.7.0 and torchvision 0.8.0 builds for RaspberryPi 4 (32bit OS)
snu-comparch/InfiniGen
InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)
NVIDIA/MagnumIO
Magnum IO community repo
manoharvhr/PYNQ-Torch
PYNQ-Torch: a framework to develop PyTorch accelerators on the PYNQ platform
sharc-lab/FPGA_ECE8893
Xilinx/xup_embedded_system_design_flow
AMD Xilinx University Program Embedded tutorial
MICV-yonsei/CT2MRI
[MICCAI 2024 Early Acceptance] Official Pytorch Code for Slice-Consistent 3D Volumetric Brain CT-to-MRI Translation with 2D Brownian Bridge Diffusion Model
UCLA-VAST/splag
Accelerating SSSP for power-law graphs using an FPGA.
AIS-SNU/PID-Comm
westerndigitalcorporation/zonefs-tools
Linux zonefs userland tools
Relaxed-System-Lab/HexGen
[ICML 2024] Serving LLMs on heterogeneous decentralized clusters.
rishucoding/reproduce_MICRO24_GPU_DLRM_inference
Sharing the codebase and steps for artifact evaluation/reproduction for MICRO 2024 paper
AIS-SNU/GraNNDis_Artifact
[PACT'24] GraNNDis. A fast and unified distributed graph neural network (GNN) training framework for both full-batch (full-graph) and mini-batch training. Provides unification of full-/mini-batch training using a novel data/communication structure.
developer-onizuka/gpudirect_storage
sterngerlach/pytorch-pynq-builds
Python wheels for PyTorch and TorchVision
MachineLearningSystem/24PPOPP-Liger
hongsunjang/Vitis-Tutorials
Vitis In-Depth Tutorials