Issues
- 0
[BUG] Unnecessary RBAC permissions
#315 opened by Yseona - 1
- 3
[feature request]I would like to monitor the kubedl_jobs_failed metric, but the label only supports kind and does not allow retrieving the jobName. The experience with exposed metrics is very unsatisfactory.
#309 opened by 13241308289 - 0
[BUG] 网站钉钉群链接已失效
#307 opened by samzong - 0
[feature request] replace kaniko with buildkit to boost model build/push stage
#304 opened by SimonCqk - 0
- 0
[Test] Github Bot Test
#301 opened by SimonCqk - 0
- 1
[feature request] Integrating pod logs with SLS
#300 opened by llinvokerl - 1
[BUG] role.yaml and all_in_one.yaml is not in sync
#257 opened by jian-he - 2
[feature request] refactor CacheBackend to Dataset
#270 opened by SimonCqk - 1
- 3
[feature request] Delete the completed tfjob created by cron when tfjob number reach historyLimit
#244 opened by heluocs - 1
- 1
- 3
[feature request] inference pipeline support
#234 opened by saeid93 - 5
- 2
[BUG]the Dingtalk link is error in the kubedl docs
#273 opened by d821776892 - 2
- 0
[feature request] infrastructure anomaly auto detection and avoid to schedule pods on abnormal nodes.
#272 opened by SimonCqk - 1
[BUG] 关于notebook容器已经启动,但是Dashboard界面无法打开
#266 opened by Wercurial - 0
KubeDL 2022 Annual Review
#265 opened by SimonCqk - 1
- 0
- 1
[feature request] Implement a kubedl CLI
#259 opened by jian-he - 6
[question] hostnetwork
#204 opened by kuizhiqing - 1
- 4
if jupyter(lab) will be added to kubedl?
#224 opened by shuxp - 1
[Feature]s upport code-server as an alternate IDE
#226 opened by wanziyu - 0
[ASoC 2022] Enable data caching cross jobs to boost job performance with high memory efficiency
#252 opened by yhalpha - 2
- 3
🧑💻 🏕 Alibaba Summer of Code (ASOC) 2022
#249 opened by SimonCqk - 0
- 0
[ASoC 2022] Implement native pytorch elastic training fashion based on torch-elastic protocol.
#251 opened by SimonCqk - 5
[feature request]mac m1 how to install kubedl
#246 opened by xiao-jay - 15
- 1
[feature request] periodically run jobs by expanding cronPolicy in job specification
#218 opened by SimonCqk - 5
[BUG] pytorch distributed training task is unschedulable when using volcano gang scheduling
#229 opened by CaRRotOne - 0
- 1
- 6
Problem installing with the helm charts [BUG]
#206 opened by saeid93 - 7
Problem installing with the yaml files [BUG]
#207 opened by saeid93 - 3
[BUG] dashboard don't show the task logs
#210 opened by CaRRotOne - 3
[BUG/Feature] when delete job, there are error logs
#199 opened by HeGaoYuan - 0
- 1
[BUG]在Dashbord提交任务界面进行任务提交失败
#188 opened by CaRRotOne - 2
[feature request] enable FallbackToLogsOnError TerminationMessagePolicy by default
#187 opened by jian-he - 4
[BUG] sharedMPIJob's kind and apiVersion are empty
#181 opened by HeGaoYuan - 1
[feature request] 想要详细的config设置入口
#176 opened by githublaohu - 5
[BUG]dashboard无法识别GPU
#185 opened by CaRRotOne