lycheenice

lycheenice's Stars

pytorch/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Language:Python80.7k21.7k
sail-sg/zero-bubble-megatron-deepspeed
Zero Bubble Pipeline Parallelism implemented on Megatron-Deepspeed
Language:Python3
microsoft/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Language:Python34k4k
pytorch/kineto
A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.
Language:HTML664161
AdaptiveCpp/AdaptiveCpp
Implementation of SYCL and C++ standard parallelism for CPUs and GPUs from all vendors: The independent, community-driven compiler for C++-based heterogeneous programming models. Lets applications adapt themselves to all the hardware in the system - even at runtime!
Language:C++1.2k155
NVIDIA/multi-gpu-programming-models
Examples demonstrating available options to program multiple GPUs in a single node or a cluster
Language:Cuda492102
intel/opencl-intercept-layer
Intercept Layer for Debugging and Analyzing OpenCL Applications
Language:C++30075
owensgroup/SlabAlloc
A dynamic GPU memory allocator, suitable for warp synchronized scenarios.
Language:Cuda94
anshumang/propreact
A profiling-prediction-scheduling control loop to share Nvidia GPUs between two or more CUDA applications
Language:C++1
UofT-EcoSystem/MXNet-GPU_Memory_Profiler
Benchmarking using MXNet GPU Memory Profiler
Language:Python32
grnydawn/GPUperf
Nsight GPU Profiler Tutorial - Summit of ORNL
Language:Fortran11
GVProf/GVProf
GVProf: A Value Profiler for GPU-based Clusters
Language:Python459
intel/pti-gpu
Profiling Tools Interfaces for GPU (PTI for GPU) is a set of Getting Started Documentation and Tools Library to start performance analysis on Intel(R) Processor Graphics easily
Language:C++19149
srvm/cupti_profiler
CUPTI GPU Profiler
Language:C++3610
sderek/CUDAAdvisor
CUDAAdvisor: a GPU profiling tool
Language:Cuda4713
openucx/ucx
Unified Communication X (mailing list - https://elist.ornl.gov/mailman/listinfo/ucx-group)
Language:C1.1k412
cwz920716/StreamExecutor
Language:C++6
tensorflow/tensorflow
An Open Source Machine Learning Framework for Everyone
Language:C++184k74.1k
opencontainers/runc
CLI tool for spawning and running containers according to the OCI specification
Language:Go11.6k2.1k
TheAlgorithms/C-Plus-Plus
Collection of various algorithms in mathematics, machine learning, computer science and physics implemented in C++ for educational purposes.
Language:C++29.7k7k
NVIDIA/go-gpuallocator
Go Abstraction for Allocating NVIDIA GPUs with Custom Policies
Language:Go10121
NVIDIA/k8s-device-plugin
NVIDIA device plugin for Kubernetes
Language:Go2.6k599
NVIDIA/gpu-operator
NVIDIA GPU Operator creates/configures/manages GPUs atop Kubernetes
Language:Go1.7k278
microsoft/KubeGPU
A GPU / device extension framework for Kubernetes
Language:Go35938
google/gpu-runtime
Language:C++158
google/nvidia_libs_test
Tests and benchmarks for cudnn (and in the future, other nvidia libraries)
Language:C++5222
google/tcmalloc
Language:C++4.2k458
tkestack/gpu-manager
Language:Go791231
PaddlePaddle/PaddleSlim
PaddleSlim is an open-source library for deep model compression and architecture search.
Language:Python1.5k347
996icu/996.ICU
Repo for counting stars and contributing. Press F to pay respect to glorious developers.
270k21.2k