Feature Request: PyTorch Lighting graceful shutdown when Remote Stop
patriksabol opened this issue ยท 3 comments
Is your feature request related to a problem? Please describe.
This is resolved issue in WandB. Citing:
"I'm using PyTorch Lightning with WandB. PyTorch Lightning's training loop catches SIGINT (and others) and gracefully shuts off, allowing the script to continue its execution. This is quite useful b/c you can stop your training in the middle and still run the test loop and other tasks after. However, when I use WandB with PyTorch Lightning, it seems like WandB catches SIGINT correctly and passes it down to the script, thus stopping the execution of the script, but it also kills the script right there, so the test loop and other tasks do not get executed. Is there a workaround?"
Is there any solution for NeptuneML?
For now, it does not work, as described (pip version neptune-client==1.2.0)
Hello @patriksabol ๐
Thank you for bringing this to our attention!
I regret to inform you that at the moment, there is no solution available for this particular issue. However, I want to assure you that your concern has been noted, and I have forwarded it to our product team as a feature request. They will review it and consider implementing it in a future update to enhance the functionality of Neptune. ๐
If you have any other questions or need further assistance, please feel free to ask. I'm here to help!
Hey @patriksabol ๐
We have merged some changes to handle graceful shutdown of Lightning scripts when remotely aborting Neptune runs.
Could you please install Lightning from the source and let us know if this fixes the issue for you?
pip install git+https://github.com/Lightning-AI/pytorch-lightning.git
Hey @patriksabol
I am closing this thread for now, but please feel free to reopen it if needed.