sobel re-optimize
Closed this issue · 1 comments
GoogleCodeExporter commented
The first step in sobel is inadvertantly unoptimized:
#if defined(HAS_ARGBTOBAYERGGROW_SSE2)
if (TestCpuFlag(kCpuHasSSE2)) {
ARGBToBayerRow = ARGBToBayerGGRow_Any_SSE2;
if (IS_ALIGNED(width, 8)) {
ARGBToBayerRow = ARGBToBayerGGRow_SSE2;
}
}
#endif
#if defined(HAS_ARGBTOBAYERROW_SSSE3)
if (TestCpuFlag(kCpuHasSSSE3)) {
ARGBToBayerRow = ARGBToBayerRow_Any_SSSE3;
if (IS_ALIGNED(width, 8)) {
ARGBToBayerRow = ARGBToBayerRow_SSSE3;
}
}
#endif
#if defined(HAS_ARGBTOBAYERGGROW_NEON)
if (TestCpuFlag(kCpuHasNEON)) {
ARGBToBayerRow = ARGBToBayerGGRow_Any_NEON;
if (IS_ALIGNED(width, 8)) {
ARGBToBayerRow = ARGBToBayerGGRow_NEON;
}
}
#endif
and the last step does not handle odd width
Testing C versus assembly should show a large difference.
It shows a difference, but not as high as it should be
set LIBYUV_DISABLE_ASM=0
set LIBYUV_WIDTH=4096
set LIBYUV_HEIGHT=2048
set LIBYUV_REPEAT=999
set LIBYUV_FLAGS=-1
out\release\libyuv_unittest --gtest_filter=*ARGBSobelXY_Opt | findstr /r
"^[^_]*_[^_]*ms"
ARGBSobelXY_Opt (12539 ms)
set LIBYUV_DISABLE_ASM=1
set LIBYUV_WIDTH=4094
set LIBYUV_HEIGHT=2048
set LIBYUV_REPEAT=999
set LIBYUV_FLAGS=0
out\release\libyuv_unittest --gtest_filter=*ARGBSobelXY_Opt | findstr /r
"^[^_]*_[^_]*ms"
ARGBSobelXY_Opt (57926 ms)
set LIBYUV_DISABLE_ASM=0
set LIBYUV_WIDTH=4094
set LIBYUV_HEIGHT=2048
set LIBYUV_REPEAT=999
set LIBYUV_FLAGS=0
out\release\libyuv_unittest --gtest_filter=*ARGBSobelXY_Opt | findstr /r
"^[^_]*_[^_]*ms"
ARGBSobelXY_Opt (22634 ms)
Original issue reported on code.google.com by fbarch...@chromium.org
on 26 May 2015 at 11:55
GoogleCodeExporter commented
r1415 does first step using ARGBToJ400 - luma calculation of jpeg color space,
and sobel last step, using any functions to handle odd width and luma that
supports AVX2. On AVX2
set LIBYUV_WIDTH=1278
set LIBYUV_HEIGHT=720
set LIBYUV_REPEAT=999
set LIBYUV_FLAGS=-1
Was C+SSE2+C
out\release\libyuv_unittest_old --gtest_filter=*Sobel* | findstr /r
"^[^_]*_[^_]*ms"
[ OK ] libyuvTest.ARGBSobel_Any (4871 ms)
[ OK ] libyuvTest.ARGBSobel_Unaligned (4891 ms)
[ OK ] libyuvTest.ARGBSobel_Invert (4953 ms)
[ OK ] libyuvTest.ARGBSobel_Opt (4891 ms)
[ OK ] libyuvTest.ARGBSobelToPlane_Any (3719 ms)
[ OK ] libyuvTest.ARGBSobelToPlane_Unaligned (3734 ms)
[ OK ] libyuvTest.ARGBSobelToPlane_Invert (3797 ms)
[ OK ] libyuvTest.ARGBSobelToPlane_Opt (3719 ms)
[ OK ] libyuvTest.ARGBSobelXY_Any (4891 ms)
[ OK ] libyuvTest.ARGBSobelXY_Unaligned (4906 ms)
[ OK ] libyuvTest.ARGBSobelXY_Invert (4984 ms)
[ OK ] libyuvTest.ARGBSobelXY_Opt (4907 ms)
Now AVX2+SSE2+SSE2
out\release\libyuv_unittest --gtest_filter=*Sobel* | findstr /r
"^[^_]*_[^_]*ms"
[ OK ] libyuvTest.ARGBSobel_Any (2531 ms)
[ OK ] libyuvTest.ARGBSobel_Unaligned (2500 ms)
[ OK ] libyuvTest.ARGBSobel_Invert (2610 ms)
[ OK ] libyuvTest.ARGBSobel_Opt (2515 ms)
[ OK ] libyuvTest.ARGBSobelToPlane_Any (2157 ms)
[ OK ] libyuvTest.ARGBSobelToPlane_Unaligned (2156 ms)
[ OK ] libyuvTest.ARGBSobelToPlane_Invert (2219 ms)
[ OK ] libyuvTest.ARGBSobelToPlane_Opt (2156 ms)
[ OK ] libyuvTest.ARGBSobelXY_Any (2500 ms)
[ OK ] libyuvTest.ARGBSobelXY_Unaligned (2531 ms)
[ OK ] libyuvTest.ARGBSobelXY_Invert (2610 ms)
[ OK ] libyuvTest.ARGBSobelXY_Opt (2515 ms)
Original comment by fbarch...@chromium.org
on 28 May 2015 at 11:43
- Changed state: Fixed