lycheenice's Stars
pytorch/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
sail-sg/zero-bubble-megatron-deepspeed
Zero Bubble Pipeline Parallelism implemented on Megatron-Deepspeed
microsoft/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
pytorch/kineto
A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.
AdaptiveCpp/AdaptiveCpp
Implementation of SYCL and C++ standard parallelism for CPUs and GPUs from all vendors: The independent, community-driven compiler for C++-based heterogeneous programming models. Lets applications adapt themselves to all the hardware in the system - even at runtime!
NVIDIA/multi-gpu-programming-models
Examples demonstrating available options to program multiple GPUs in a single node or a cluster
intel/opencl-intercept-layer
Intercept Layer for Debugging and Analyzing OpenCL Applications
owensgroup/SlabAlloc
A dynamic GPU memory allocator, suitable for warp synchronized scenarios.
anshumang/propreact
A profiling-prediction-scheduling control loop to share Nvidia GPUs between two or more CUDA applications
UofT-EcoSystem/MXNet-GPU_Memory_Profiler
Benchmarking using MXNet GPU Memory Profiler
grnydawn/GPUperf
Nsight GPU Profiler Tutorial - Summit of ORNL
GVProf/GVProf
GVProf: A Value Profiler for GPU-based Clusters
intel/pti-gpu
Profiling Tools Interfaces for GPU (PTI for GPU) is a set of Getting Started Documentation and Tools Library to start performance analysis on Intel(R) Processor Graphics easily
srvm/cupti_profiler
CUPTI GPU Profiler
sderek/CUDAAdvisor
CUDAAdvisor: a GPU profiling tool
openucx/ucx
Unified Communication X (mailing list - https://elist.ornl.gov/mailman/listinfo/ucx-group)
cwz920716/StreamExecutor
tensorflow/tensorflow
An Open Source Machine Learning Framework for Everyone
opencontainers/runc
CLI tool for spawning and running containers according to the OCI specification
TheAlgorithms/C-Plus-Plus
Collection of various algorithms in mathematics, machine learning, computer science and physics implemented in C++ for educational purposes.
NVIDIA/go-gpuallocator
Go Abstraction for Allocating NVIDIA GPUs with Custom Policies
NVIDIA/k8s-device-plugin
NVIDIA device plugin for Kubernetes
NVIDIA/gpu-operator
NVIDIA GPU Operator creates/configures/manages GPUs atop Kubernetes
microsoft/KubeGPU
A GPU / device extension framework for Kubernetes
google/gpu-runtime
google/nvidia_libs_test
Tests and benchmarks for cudnn (and in the future, other nvidia libraries)
google/tcmalloc
tkestack/gpu-manager
PaddlePaddle/PaddleSlim
PaddleSlim is an open-source library for deep model compression and architecture search.
996icu/996.ICU
Repo for counting stars and contributing. Press F to pay respect to glorious developers.