[onert] DepthwiseConv2D training value error

Question

[onert] DepthwiseConv2D training value error

Closed this issue 3 months ago · 3 comments

What

I added unit tests for DepthwiseConv2D training, but, in some cases, there were value differences in results compared to tensorflow. Let's resolve it.

Cases of mismatched result values

Case 1

input shape : (1, 3, 2, 2)
weight shape : (1, 2, 2, 4)
bias shape : (4)
output shape : (1, 2, 1 , 4)
Padding : valid
depth multiplier : 2

tensorflow result : #12985 (comment)

Case 2

input shape : (1, 2, 2, 2)
weight shape : (1,3, 1, 2)
bias shape : (2)
output shape : (1, 2, 2 , 2)
Padding : same
depth multiplier : 1

tensorflow result : #12985 (comment)

ragmani commented 3 months ago

Done.

Answer 1 · 2024-06-19T04:50:47.000Z

After applying the following patch, the above cases works well on draft PR, but does not on master branch.

diff --git a/compute/cker/include/cker/train/operation/DepthwiseConv.h b/compute/cker/include/cker/train/operation/DepthwiseConv.h
index 05a937166e..47aaf13fbf 100644
--- a/compute/cker/include/cker/train/operation/DepthwiseConv.h
+++ b/compute/cker/include/cker/train/operation/DepthwiseConv.h
@@ -101,10 +101,12 @@ public:
     const int pad_width = params.padding_values.width;
 
     depthwise_conv_op::LaunchDepthwiseConvBackpropFilterOp<Eigen::ThreadPoolDevice, T>()(
-      batch, input_width, input_height, input_depth, filter_width, filter_height, depth_multiplier,
-      stride, pad_width, pad_height, incoming_width, incoming_height, output_depth, incoming_data,
+      batch, input_height, input_width, input_depth, filter_height, filter_width, depth_multiplier,
+      stride, pad_height, pad_width, incoming_height, incoming_width, output_depth, incoming_data,
       input_data, filter_grad_data, padded_filter_data, filter_buffers_data);
   }
+
+
 };
 
 } // namespace train

$ ./Product/x86_64-linux.debug/out/unittest/nnfw_api_gtest --gtest_filter=GenModelTrain.OneOp_Dep
thwiseConv2D*
Note: Google Test filter = GenModelTrain.OneOp_DepthwiseConv2D*
[==========] Running 4 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 4 tests from GenModelTrain
[ RUN      ] GenModelTrain.OneOp_DepthwiseConv2D
/home/jang/git/ragmani/onert/ONE/tests/nnfw_api/lib/GenModelTrain.h:437: Failure
The difference between expected and actual is 5.7049000263214111, which exceeds 0.001, where
expected evaluates to 1.1700999736785889,
actual evaluates to 6.875, and
0.001 evaluates to 0.001.
Loss #0
[  FAILED  ] GenModelTrain.OneOp_DepthwiseConv2D (15 ms)
[ RUN      ] GenModelTrain.OneOp_DepthwiseConv2D_No_Multiplier
/home/jang/git/ragmani/onert/ONE/tests/nnfw_api/lib/GenModelTrain.h:437: Failure
The difference between expected and actual is 22.456899642944336, which exceeds 0.001, where
expected evaluates to 15.543100357055664,
actual evaluates to 38, and
0.001 evaluates to 0.001.
Loss #0
[  FAILED  ] GenModelTrain.OneOp_DepthwiseConv2D_No_Multiplier (2 ms)
[ RUN      ] GenModelTrain.OneOp_DepthwiseConv2D_No_Multiplier_RELU6
[       OK ] GenModelTrain.OneOp_DepthwiseConv2D_No_Multiplier_RELU6 (2 ms)
[ RUN      ] GenModelTrain.OneOp_DepthwiseConv2D_3x3
/home/jang/git/ragmani/onert/ONE/tests/nnfw_api/lib/GenModelTrain.h:437: Failure
The difference between expected and actual is 157.26620006561279, which exceeds 0.001, where
expected evaluates to 13.733799934387207,
actual evaluates to 171, and
0.001 evaluates to 0.001.
Loss #0
[  FAILED  ] GenModelTrain.OneOp_DepthwiseConv2D_3x3 (2 ms)
[----------] 4 tests from GenModelTrain (22 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 1 test suite ran. (22 ms total)
[  PASSED  ] 1 test.
[  FAILED  ] 3 tests, listed below:
[  FAILED  ] GenModelTrain.OneOp_DepthwiseConv2D
[  FAILED  ] GenModelTrain.OneOp_DepthwiseConv2D_No_Multiplier
[  FAILED  ] GenModelTrain.OneOp_DepthwiseConv2D_3x3

 3 FAILED TESTS

Answer 2 · 2024-06-19T05:20:41.000Z

After applying the following patch, the above cases works well on draft PR, but does not on master branch.

I missed cgen.markAllOpsAsTrainable(). I appended it then all tests works well.

$ ./Product/x86_64-linux.debug/out/unittest/nnfw_api_gtest --gtest_filter=GenModelTrain.OneOp_DepthwiseConv2D*
Note: Google Test filter = GenModelTrain.OneOp_DepthwiseConv2D*
[==========] Running 4 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 4 tests from GenModelTrain
[ RUN      ] GenModelTrain.OneOp_DepthwiseConv2D
[       OK ] GenModelTrain.OneOp_DepthwiseConv2D (15 ms)
[ RUN      ] GenModelTrain.OneOp_DepthwiseConv2D_No_Multiplier
[       OK ] GenModelTrain.OneOp_DepthwiseConv2D_No_Multiplier (2 ms)
[ RUN      ] GenModelTrain.OneOp_DepthwiseConv2D_No_Multiplier_RELU6
[       OK ] GenModelTrain.OneOp_DepthwiseConv2D_No_Multiplier_RELU6 (2 ms)
[ RUN      ] GenModelTrain.OneOp_DepthwiseConv2D_3x3
[       OK ] GenModelTrain.OneOp_DepthwiseConv2D_3x3 (3 ms)
[----------] 4 tests from GenModelTrain (23 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 1 test suite ran. (23 ms total)
[  PASSED  ] 4 tests.