Pytorch lightning
Opened this issue · 2 comments
Do you know if ort works with Pytorch lightning?
I am trying it but I am getting:
raise new_exception(raised_exception) from raised_exception
onnxruntime.training.ortmodule._fallback.ORTModuleTorchModelException: ORTModule does not support adding modules to it.
Also is there a way to configure ort automatically when you install the package with conda?
currently I have to call this from my code:
from onnxruntime.training.ortmodule.torch_cpp_extensions import install as ortmodule_install
ortmodule_install.build_torch_cpp_extensions()
Does it work with pytorch 10 and cuda 11?
Thanks!
Ok the issue here is sync_batchnorm=True,
if I switch it to False. It works.
Any ideas about how to convert the model to ORT after sync_batchnorm conversion happens?
I have managed to hack it with this:
from pytorch_lightning.plugins import DDPPlugin
import torch_ort
from pytorch_lightning.overrides import LightningDistributedModule
from torch.nn import Module
from torch.nn.parallel.distributed import DistributedDataParallel
class ORTPlugin(DDPPlugin):
def _setup_model(self, model: Module) -> DistributedDataParallel:
"""Wraps the model into a :class:`~torch.nn.parallel.distributed.DistributedDataParallel` module."""
from onnxruntime.training.ortmodule.torch_cpp_extensions import install as ortmodule_install
ortmodule_install.build_torch_cpp_extensions()
model.module.model = torch_ort.ORTModule(model.module.model)
return DistributedDataParallel(module=model, device_ids=self.determine_ddp_device_ids(), **self._ddp_kwargs)
If you have a better way to fix it please let me know.