pytorch/ort

Pytorch lightning

Opened this issue · 2 comments

Do you know if ort works with Pytorch lightning?

I am trying it but I am getting:

raise new_exception(raised_exception) from raised_exception

onnxruntime.training.ortmodule._fallback.ORTModuleTorchModelException: ORTModule does not support adding modules to it.

Also is there a way to configure ort automatically when you install the package with conda?

currently I have to call this from my code:

from onnxruntime.training.ortmodule.torch_cpp_extensions import install as ortmodule_install
ortmodule_install.build_torch_cpp_extensions()

Does it work with pytorch 10 and cuda 11?

Thanks!

Ok the issue here is sync_batchnorm=True,

if I switch it to False. It works.

Any ideas about how to convert the model to ORT after sync_batchnorm conversion happens?

I have managed to hack it with this:

from pytorch_lightning.plugins import DDPPlugin
import torch_ort
from pytorch_lightning.overrides import LightningDistributedModule
from torch.nn import Module
from torch.nn.parallel.distributed import DistributedDataParallel
class ORTPlugin(DDPPlugin):
    def _setup_model(self, model: Module) -> DistributedDataParallel:
        """Wraps the model into a :class:`~torch.nn.parallel.distributed.DistributedDataParallel` module."""
        from onnxruntime.training.ortmodule.torch_cpp_extensions import install as ortmodule_install
        ortmodule_install.build_torch_cpp_extensions()
        model.module.model = torch_ort.ORTModule(model.module.model)
        return DistributedDataParallel(module=model, device_ids=self.determine_ddp_device_ids(), **self._ddp_kwargs)

If you have a better way to fix it please let me know.