StephenGuanqi's Stars
pytorch/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
ray-project/ray
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
facebook/rocksdb
A library that provides an embeddable, persistent key-value store for fast storage.
taichi-dev/taichi
Productive, portable, and performant GPU programming in Python.
apache/tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
NVIDIA/TensorRT
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
dgryski/go-perfbook
Thoughts on Go performance optimization
NVIDIA/Megatron-LM
Ongoing research training transformer models at scale
cameron314/concurrentqueue
A fast multi-producer, multi-consumer lock-free concurrent queue for C++11
NVIDIA/apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
wang-xinyu/tensorrtx
Implementation of popular deep learning networks with TensorRT network definition API
techiescamp/kubernetes-learning-path
A roadmap to learn Kubernetes from scratch (Beginner to Advanced level)
shuhuai007/Machine-Learning-Session
abumq/easyloggingpp
C++ logging library. It is powerful, supports asynchronous low latency, extendable, light-weight, fast performing, thread and type safe and consists of many built-in features. It provides ability to write logs in your own customized format. It also provide support for logging your classes, third-party libraries, STL and third-party containers etc.
PlatformLab/NanoLog
Nanolog is an extremely performant nanosecond scale logging system for C++ that exposes a simple printf-like API.
envoyproxy/ratelimit
Go/gRPC service designed to enable generic rate limit scenarios from different types of applications.
Netflix/EVCache
A distributed in-memory data store for the cloud
tqchen/tinyflow
Tutorial code on how to build your own Deep Learning System in 2k Lines
CppCon/CppCon2016
Slides and other materials from CppCon 2016
autodiff/autodiff
automatic differentiation made easier for C++
veekaybee/what_are_embeddings
A deep dive into embeddings starting from fundamentals
zhihu/cuBERT
Fast implementation of BERT inference directly on NVIDIA (CUDA, CUBLAS) and Intel MKL
pytorch/tvm
TVM integration into PyTorch
Cjkkkk/CUDA_gemm
A simple high performance CUDA GEMM implementation.
Yinghan-Li/YHs_Sample
Yinghan's Code Sample
bob-carpenter/ad-handbook
Automatic Differentiation Handbook
cwpearson/nvidia-performance-tools
Instructions, Docker images, and examples for Nsight Compute and Nsight Systems
nicolaswilde/cuda-tensorcore-hgemm
IMSY-DKFZ/htc
Semantic organ segmentation for hyperspectral images.
huanyingtianhe/RedisDB