nagadomi/nunif

MacMini

414726193 opened this issue · 18 comments

MPS: Unsupported Border padding mode

I don't have Mac hardware and it costs about $30/day for EC2 Mac instance, so I basically didn't test on macOS.
I will check it once within a year.

Maybe the error can be fixed as following. Not tested.

diff --git a/iw3/models/row_flow_v2.py b/iw3/models/row_flow_v2.py
index 0628521..1811c11 100644
--- a/iw3/models/row_flow_v2.py
+++ b/iw3/models/row_flow_v2.py
@@ -2,6 +2,7 @@ import torch
 import torch.nn as nn
 import torch.nn.functional as F
 from nunif.models import I2IBaseModel, register_model
+from collections import OrderedDict
 
 
 @register_model
@@ -14,15 +15,19 @@ class RowFlowV2(I2IBaseModel):
             nn.Conv2d(3, 16, kernel_size=(1, 3), stride=1, padding=(0, 1), padding_mode="replicate"),
             nn.ReLU(inplace=True))
         self.non_overlap = nn.Conv2d(16, 1, kernel_size=1, stride=1, padding=0)
-        self.overlap_residual = nn.Sequential(
-            nn.Conv2d(16, 16, kernel_size=(1, 9), stride=1, padding=(0, 4), padding_mode="replicate"),
-            nn.ReLU(inplace=True),
-            nn.Conv2d(16, 32, kernel_size=(1, 9), stride=1, padding=(0, 4), padding_mode="replicate"),
-            nn.ReLU(inplace=True),
-            nn.Conv2d(32, 32, kernel_size=(1, 9), stride=1, padding=(0, 4), padding_mode="replicate"),
-            nn.ReLU(inplace=True),
-            nn.Conv2d(32, 1, kernel_size=3, stride=1, padding=1, padding_mode="replicate"),
-        )
+        self.overlap_residual = nn.Sequential(OrderedDict([
+            ("pad0", nn.ReplicationPad2d((0, 4))),
+            ("0", nn.Conv2d(16, 16, kernel_size=(1, 9), stride=1, padding=0)),
+            ("1", nn.ReLU(inplace=True)),
+            ("pad1", nn.ReplicationPad2d((0, 4))),
+            ("2", nn.Conv2d(16, 32, kernel_size=(1, 9), stride=1, padding=0)),
+            ("3", nn.ReLU(inplace=True)),
+            ("pad2", nn.ReplicationPad2d((0, 4))),
+            ("4", nn.Conv2d(32, 32, kernel_size=(1, 9), stride=1, padding=0)),
+            ("5", nn.ReLU(inplace=True)),
+            ("pad3", nn.ReplicationPad2d((1, 1))),
+            ("6", nn.Conv2d(32, 1, kernel_size=3, stride=1, padding=0)),
+        ]))
         self.register_buffer("delta_scale", torch.tensor(1.0 / 127.0))
         self.delta_output = False

Thank you BUT error:Only 2D, 3D, 4D, 5D padding with non-constant padding are supported for now

Sorry, I misunderstood nn.ReplicationPad2d argument.

diff --git a/iw3/models/row_flow_v2.py b/iw3/models/row_flow_v2.py
index 0628521..9085141 100644
--- a/iw3/models/row_flow_v2.py
+++ b/iw3/models/row_flow_v2.py
@@ -2,6 +2,7 @@ import torch
 import torch.nn as nn
 import torch.nn.functional as F
 from nunif.models import I2IBaseModel, register_model
+from collections import OrderedDict
 
 
 @register_model
@@ -14,15 +15,19 @@ class RowFlowV2(I2IBaseModel):
             nn.Conv2d(3, 16, kernel_size=(1, 3), stride=1, padding=(0, 1), padding_mode="replicate"),
             nn.ReLU(inplace=True))
         self.non_overlap = nn.Conv2d(16, 1, kernel_size=1, stride=1, padding=0)
-        self.overlap_residual = nn.Sequential(
-            nn.Conv2d(16, 16, kernel_size=(1, 9), stride=1, padding=(0, 4), padding_mode="replicate"),
-            nn.ReLU(inplace=True),
-            nn.Conv2d(16, 32, kernel_size=(1, 9), stride=1, padding=(0, 4), padding_mode="replicate"),
-            nn.ReLU(inplace=True),
-            nn.Conv2d(32, 32, kernel_size=(1, 9), stride=1, padding=(0, 4), padding_mode="replicate"),
-            nn.ReLU(inplace=True),
-            nn.Conv2d(32, 1, kernel_size=3, stride=1, padding=1, padding_mode="replicate"),
-        )
+        self.overlap_residual = nn.Sequential(OrderedDict([
+            ("pad0", nn.ReplicationPad2d((4, 4, 0, 0))),
+            ("0", nn.Conv2d(16, 16, kernel_size=(1, 9), stride=1, padding=0)),
+            ("1", nn.ReLU(inplace=True)),
+            ("pad1", nn.ReplicationPad2d((4, 4, 0, 0))),
+            ("2", nn.Conv2d(16, 32, kernel_size=(1, 9), stride=1, padding=0)),
+            ("3", nn.ReLU(inplace=True)),
+            ("pad2", nn.ReplicationPad2d((4, 4, 0, 0))),
+            ("4", nn.Conv2d(32, 32, kernel_size=(1, 9), stride=1, padding=0)),
+            ("5", nn.ReLU(inplace=True)),
+            ("pad3", nn.ReplicationPad2d((1, 1, 1, 1))),
+            ("6", nn.Conv2d(32, 1, kernel_size=3, stride=1, padding=0)),
+        ]))
         self.register_buffer("delta_scale", torch.tensor(1.0 / 127.0))
         self.delta_output = False

Thank you for your answer, but the error is still there: MPS: Unsupported Border padding mode

Then MPS backend still cannot reproduce the same results as CUDA backend.
Use CPU, or wait until PyTorch dev implements replication padding for MPS, or implement replication padding yourself.

If you replace nn.ReplicationPad2d with nn.ZeroPad2d it maybe works, but the output of the model is undefined because the configuration is not the same as when the model was trained.

I use replication padding elsewhere so I will try to implement it for MPS. (It may be slower, but better than it not working)
Wait a few days if you are willing to test it.
But I don't know if it will work in the end, because even if I solve this problem, there could still be other problems.

deleted

Sorry, I missed one place, so I fixed it.

diff --git a/iw3/models/row_flow_v2.py b/iw3/models/row_flow_v2.py
index 0628521..18f9e50 100644
--- a/iw3/models/row_flow_v2.py
+++ b/iw3/models/row_flow_v2.py
@@ -2,6 +2,29 @@ import torch
 import torch.nn as nn
 import torch.nn.functional as F
 from nunif.models import I2IBaseModel, register_model
+from collections import OrderedDict
+
+
+class ReplicationPad2dNaive(nn.Module):
+    def __init__(self, padding):
+        assert isinstance(padding, (list, tuple)) and len(padding) == 4
+        self.left = padding[0]
+        self.right = padding[1]
+        self.top = padding[2]
+        self.bottom = padding[3]
+        super().__init__()
+
+    def forward(self, x):
+        assert x.ndim == 4
+        if self.left > 0:
+            x = torch.cat((*((x[:, :, :, :1],) * self.left), x), dim=3)
+        if self.right > 0:
+            x = torch.cat((x, *((x[:, :, :, -1:],) * self.right)), dim=3)
+        if self.top > 0:
+            x = torch.cat((*((x[:, :, :1, :],) * self.top), x), dim=2)
+        if self.bottom > 0:
+            x = torch.cat((x, *((x[:, :, -1:, :],) * self.bottom)), dim=2)
+        return x
 
 
 @register_model
@@ -14,17 +37,22 @@ class RowFlowV2(I2IBaseModel):
             nn.Conv2d(3, 16, kernel_size=(1, 3), stride=1, padding=(0, 1), padding_mode="replicate"),
             nn.ReLU(inplace=True))
         self.non_overlap = nn.Conv2d(16, 1, kernel_size=1, stride=1, padding=0)
-        self.overlap_residual = nn.Sequential(
-            nn.Conv2d(16, 16, kernel_size=(1, 9), stride=1, padding=(0, 4), padding_mode="replicate"),
-            nn.ReLU(inplace=True),
-            nn.Conv2d(16, 32, kernel_size=(1, 9), stride=1, padding=(0, 4), padding_mode="replicate"),
-            nn.ReLU(inplace=True),
-            nn.Conv2d(32, 32, kernel_size=(1, 9), stride=1, padding=(0, 4), padding_mode="replicate"),
-            nn.ReLU(inplace=True),
-            nn.Conv2d(32, 1, kernel_size=3, stride=1, padding=1, padding_mode="replicate"),
-        )
+        self.overlap_residual = nn.Sequential(OrderedDict([
+            ("pad0", ReplicationPad2dNaive((4, 4, 0, 0))),
+            ("0", nn.Conv2d(16, 16, kernel_size=(1, 9), stride=1, padding=0)),
+            ("1", nn.ReLU(inplace=True)),
+            ("pad1", ReplicationPad2dNaive((4, 4, 0, 0))),
+            ("2", nn.Conv2d(16, 32, kernel_size=(1, 9), stride=1, padding=0)),
+            ("3", nn.ReLU(inplace=True)),
+            ("pad2", ReplicationPad2dNaive((4, 4, 0, 0))),
+            ("4", nn.Conv2d(32, 32, kernel_size=(1, 9), stride=1, padding=0)),
+            ("5", nn.ReLU(inplace=True)),
+            ("pad3", ReplicationPad2dNaive((1, 1, 1, 1))),
+            ("6", nn.Conv2d(32, 1, kernel_size=3, stride=1, padding=0)),
+        ]))
         self.register_buffer("delta_scale", torch.tensor(1.0 / 127.0))
         self.delta_output = False
+        self.pre_pad = ReplicationPad2dNaive((28,) * 4)
 
         for m in self.modules():
             if isinstance(m, (nn.Conv2d,)):
@@ -66,7 +94,7 @@ class RowFlowV2(I2IBaseModel):
 
     def _forward_delta_only(self, x):
         assert not self.training
-        x = F.pad(x, [28] * 4, mode="replicate")
+        x = self.pre_pad(x)
         delta = self._forward(x)[1]
         delta = torch.cat([delta, torch.zeros_like(delta)], dim=1)
         delta = F.pad(delta, [-28] * 4)

I'm sorry for some accidents these days, thank you for your answers. After I modified according to your suggestions, the same problem still occurred: MPS: Unsupported Border padding mode

image
Change these two and you're ready to run

Thank you for the info.
That part is more of image post-processing rather than inference with machine learning model, so there is no major problem with that change. (The output image will be different from the CUDA version, but there is no practical problem.)


This is just my perspective.
Current code is

z = F.grid_sample(c, grid, mode="bicubic", padding_mode="border", align_corners=True)

padding_mode="border" means the same as ReplicationPadding, the error makes sense.
Also, it is not that surprising that mode="bicubic" is not supported.
However, I did not expect padding_mode="reflection" to be supported. Because reflection padding is a more complex operation than replication padding. It is curious why PyTorch MPS backend does not support replication padding.

Strange indeed. Here is another question, do you have a way to solve the problem from the root, thank you very much

'aten::_upsample_bilinear2d_aa.out' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on pytorch/pytorch#77764. As a temporary fix, you can set the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

Which line of which file causes that error?

If you follow the error message, for example,
PYTORCH_ENABLE_MPS_FALLBACK=1 python -m iw3.cli ...
command seems to work around it (but it says CPU will be used instead of MPS).
I am not familiar with IDE, but it should have environment variable settings.

I set os.environ['PYTORCH_ENABLE_MPS_FALLBACK'] = '1' in code
Unable to run.
However, it works fine to type: export PYTORCH_ENABLE_MPS_FALLBACK=1 in the terminal
How would you set this environment variable in your code

Maybe it should be setup before importing torch.
In the first line of iw3/cli.py or iw3/gui.py, try adding

import os
os.environ['PYTORCH_ENABLE_MPS_FALLBACK'] = '1'

Wow! Success! Thank you! Thank you! Thank you very much, because I am a beginner and I can't do anything.
I want to package it and run it on mobile terminal, such as ipad pro with M1 chip. Is there any way

I don't know anything about Apple Devices and it's not my project, so just do it yourself.
In fact, I've never owned an Apple product.