Pinned issues
Issues
- 1
Deprecation Notice: ray_lightning to be Replaced with New LightningTrainer in Ray 2.4
#258 opened by woshiyyya - 0
Trails did not complete error
#257 opened by Bk073 - 2
- 0
Trials hang when using a scheduler
#253 opened by dcfidalgo - 0
What happens with custom samplers?
#252 opened by AugustoPeres - 0
RuntimeError: Error(s) in loading state_dict: Unexpected key(s) when recovering results from main process during Trainer.fit()
#246 opened by davzaman - 0
TPU support?
#245 opened by platers - 0
[Question] Is it necessary to adapt report and Checkpointing to the newly introduced session and Checkpoint API of Ray AIR?
#243 opened by MarkusSpanring - 1
Error when using WandbLogger
#205 opened by KwanWaiChung - 1
- 2
- 6
- 1
Rank Zero Deprecation
#230 opened by lcaquot94 - 1
Can not checkpoint and log
#228 opened by lcaquot94 - 0
Ray lightning opens a new mlflow run
#225 opened by AugustoPeres - 2
TuneReportCheckpointCallback error
#219 opened by jakubMitura14 - 0
population based scheduler error
#220 opened by jakubMitura14 - 5
no training starts although flag is running
#216 opened by jakubMitura14 - 1
Teardown after trainer.fit() takes exceptionally long when using RayStrategy with large models
#207 opened by MarkusSpanring - 0
Deterministic mode is not set on remote worker when `Trainer` is set to `deterministic`
#213 opened by MarkusSpanring - 4
Question: Why use ray_lightning instead of pytorch_lightning for multi-node training?
#212 opened by saryazdi - 1
Worker nodes don't start for ray-lightning & aws
#210 opened by toru34 - 0
- 20
- 0
adding the version in `__init__`
#191 opened by JiahaoYao - 6
- 5
Distributed training performance slowdown when resuming from a checkpoint.
#184 opened by subhashbylaiah - 0
- 2
`ray_horovod` leaks gpu memory on the `cuda:0`
#181 opened by JiahaoYao - 2
`ray_horovod` multi pid process in the `run`
#182 opened by JiahaoYao - 0
`ray_ddp` issue of `Leaking Caffe2 thread-pool after fork. (function pthreadpool)`
#180 opened by JiahaoYao - 3
`ray_ddp` gpu issue
#179 opened by JiahaoYao - 1
`ray_ddp` global and local rank
#175 opened by JiahaoYao - 1
tune test: do we need to count the head node cpu?
#178 opened by JiahaoYao - 2
`ray_ddp` showing no use of gpu
#177 opened by JiahaoYao - 1
`ray_ddp` the progressive bar is broken
#176 opened by JiahaoYao - 10
ray ddp fails with 2 gpu workers
#174 opened by JiahaoYao - 0
`shard-ddp` test of system exit
#173 opened by JiahaoYao - 1
warning in the ci test (change the deprecated api)
#172 opened by JiahaoYao - 0
torch remove the checkpoint when `is_global_zero` is not set? (multi-worker setting)
#171 opened by JiahaoYao - 1
log is changed in the new version of pytorch lightning
#170 opened by JiahaoYao - 0
change the `checkpoint_callback=True`
#169 opened by JiahaoYao - 3
warning from the horovod trainer
#168 opened by JiahaoYao - 0
horovod lightning integration missing the log dir
#167 opened by JiahaoYao - 1
horovod installation issue
#165 opened by JiahaoYao - 1
trainer is not consistent during the `ray_ddp`
#160 opened by JiahaoYao - 0
- 2
- 1
Using LightningCLI to parse plugin options from the config file fails when using the RayPlugin.
#151 opened by subhashbylaiah - 0