kanghui0204's Stars
engineai-robotics/engineai_legged_gym
bytedance/flux
A fast communication-overlapping library for tensor parallelism on GPUs.
mrnorman/miniWeather
A parallel programming training mini app simulating weather-like flows
MoonshotAI/batched-benchmark
NVIDIA/nvbandwidth
A tool for bandwidth measurements on NVIDIA GPUs.
dmlc/dlpack
common in-memory tensor structure
Azure/MS-AMP-Examples
Examples for MS-AMP package.
Azure/MS-AMP
Microsoft Automatic Mixed Precision Library
mit-han-lab/bevfusion
[ICRA'23] BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation
NVIDIA-AI-IOT/Lidar_AI_Solution
A project demonstrating Lidar related AI solutions, including three GPU accelerated Lidar/camera DL networks (PointPillars, CenterPoint, BEVFusion) and the related libs (cuPCL, 3D SparseConvolution, YUV2RGB, cuOSD,).
NVIDIA-Merlin/distributed-embeddings
distributed-embeddings is a library for building large embedding based models in Tensorflow 2.
NVIDIA-Merlin/HierarchicalKV
HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of HierarchicalKV is to store key-value feature-embeddings on high-bandwidth memory (HBM) of GPUs and in host memory. It also can be used as a generic key-value storage.
chenzomi12/AISystem
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
CVCUDA/CV-CUDA
CV-CUDA™ is an open-source, GPU accelerated library for cloud-scale image processing and computer vision.
aappleby/smhasher
Automatically exported from code.google.com/p/smhasher
DeepRec-AI/HybridBackend
A high-performance framework for training wide-and-deep recommender systems on heterogeneous cluster
wu-kan/wu-kan.github.io
:sparkles: My personal site & Template for jekyll-theme-WuK
NVIDIA/trt-samples-for-hackathon-cn
Simple samples for TensorRT programming
innerlee/setup
Setup a new machine without sudo!
NVIDIA/open-gpu-kernel-modules
NVIDIA Linux open GPU kernel module source
facebookresearch/metaseq
Repo for external large-scale work
alex--m/mp-rdma
Multi-Path Transport for RDMA in Datacenters (Course assignment)
torvalds/linux
Linux kernel source tree
dhoeflinger/CUDA
CUDA MPs
shawcm/GemmExample
GEMM multi-GPU example program
NVIDIA/FasterTransformer
Transformer related optimization, including BERT, GPT
dmlc/ps-lite
A lightweight parameter server interface
NVIDIA/DALI
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
tarickb/the-geek-in-the-corner
Sample code from thegeekinthecorner.com
Mellanox/nccl-rdma-sharp-plugins
RDMA and SHARP plugins for nccl library