denkensk
Hope to do something cool to help more people / sig-scheduling
Shopee ex-@alibaba ex-@baiduBeijing China
Pinned Repositories
Abirdcfly
AI
api
arena
A CLI for Kubeflow.
armada
A multi-cluster batch queuing system for high-throughput workloads on Kubernetes.
autoscaler
Autoscaling components for Kubernetes
enhancements
Features tracking repo for Kubernetes releases
etcd
Distributed reliable key-value store for the most critical data of a distributed system
kubernetes
Production-Grade Container Scheduling and Management
machine_learning_in_action
learn for machine learning in action
denkensk's Repositories
denkensk/Abirdcfly
denkensk/api
denkensk/arena
A CLI for Kubeflow.
denkensk/enhancements
Features tracking repo for Kubernetes releases
denkensk/kubernetes
Production-Grade Container Scheduling and Management
denkensk/ChatGPT-Next-Web
A well-designed cross-platform ChatGPT UI (Web / PWA / Linux / Win / MacOS). 一键拥有你自己的跨平台 ChatGPT 应用。
denkensk/cluster-api
Home for Cluster API, a subproject of sig-cluster-lifecycle
denkensk/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
denkensk/DeepSpeedExamples
Example models using DeepSpeed
denkensk/denkensk
denkensk/gitdm
📜Fork for tracking CNCF projects
denkensk/hierarchical-namespaces
Home of the Hierarchical Namespace Controller (HNC). Adds hierarchical policies and delegated creation to Kubernetes namespaces for improved in-cluster multitenancy.
denkensk/horovod
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
denkensk/internal-acls
Repository used to main group ACLs used by Kubeflow developers
denkensk/kube-capacity
A simple CLI that provides an overview of the resource requests, limits, and utilization in a Kubernetes cluster
denkensk/kube-queue
denkensk/kube-queue-2
denkensk/kuberay
A toolkit to run Ray applications on Kubernetes
denkensk/kueue
Kueue: Kubernetes-native Job Queueing
denkensk/Megatron-LM
Ongoing research training transformer models at scale
denkensk/mpi-operator
Kubernetes Operator for Allreduce-style Distributed Training
denkensk/open-gpu-kernel-modules
NVIDIA Linux open GPU kernel module source
denkensk/pytorch-operator-extension-1
denkensk/ray
An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.
denkensk/scheduler-plugins
Repository for out-of-tree scheduler plugins based on scheduler framework.
denkensk/seldon-core
An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models
denkensk/test-infra
Test infrastructure for the Kubernetes project.
denkensk/tf-operator-extension
denkensk/training-operator
Training operators on Kubernetes.
denkensk/website
Kubernetes website and documentation repo: