Issues
- 3
How to set dshm size for training?
#1044 opened by Andrew-Su-0718 - 2
- 15
arena top job lost resource information
#1082 opened by kangzemin - 0
- 0
Update model manage
#1063 opened by ChenYi015 - 0
Add support for model management
#1059 opened by ChenYi015 - 1
- 1
- 0
cannot update --data param of a kserve job
#1048 opened by gujingit - 2
can not delete kserve type job
#1047 opened by gujingit - 1
- 13
- 1
No Binary assets for release v0.9.12
#1045 opened by panpan0000 - 5
Enhancements on code quality and stability
#1029 opened by Syulin7 - 1
how to customize the et-operator?
#965 opened by qingqiuhe - 1
How to uninstall arena?
#567 opened by gugumituo - 1
Is there any plan to support v1 mpijob?
#891 opened by Metal-joker - 1
is this project dead now
#958 opened by joneepenk - 3
is arena support Ray ?
#1002 opened by samzong - 4
Rotate secrets stored in CircleCI
#896 opened by eliaslevy - 28
- 1
fail to submit pytorchjob with helm v3.7.2
#888 opened by WangAooa - 0
arena get should display user
#856 opened by happy2048 - 0
java-sdk代码不全吗?
#817 opened by guoziyi-study - 0
arena-sdk是否支持操作任何自定义资源?
#816 opened by guoziyi-study - 0
小白使用arena相关问题
#814 opened by guoziyi-study - 2
helm 3+ support
#736 opened by tzstoyanov - 0
support clean task policy for mpijob
#722 opened by cheyang - 0
Support kube-queue in arena
#699 opened by denkensk - 0
Support torchserve in arena
#702 opened by heluocs - 3
submit pytorchjob auto delete Succeeded pod
#698 opened by queguan - 2
- 1
when arena command line contains a comma, the content after the comma will be lost
#677 opened by gooddayforever - 1
Bump mpi-operator dependency
#686 opened by Jeffwan - 1
Pytorch distributed job failed when master replica start later than worker replica
#547 opened by meibenjin - 5
- 1
arena 0.5 , submit mpi job is not run successed
#658 opened by queguan - 0
IDE (VSCode/PyCharm) connectivity support?
#654 opened by elgalu - 0
sparkjob能否人工指定service-account?
#612 opened by Alienfeel - 0
pyspark support in sparkjob
#611 opened by Alienfeel - 0
Keep Arena roadmap updated
#608 opened by mkbhanda - 3
- 0
failed to get LogViewer: No LOGVIEWER Installed.
#570 opened by gugumituo - 4
- 1
无法获取日志,invalid instance name
#496 opened by lppsuixn - 1
serving目前仅支持clusterIp,需要支持loadbalancer模式
#488 opened by haohao667788 - 5
WG sponsorship for arena
#405 opened by Bobgy - 4
gangSchdName默认值与最新版本的kube-batch默认名称不一致导致使用刚性调度失败
#391 opened by yajunwong - 3
Arena needs to own its own test infra
#406 opened by Bobgy - 3
运行分布式demo指定的worker大于1个时,容器异常退出,如图所示
#390 opened by lucasaytt