Update rtcd.pl
lu-zero opened this issue · 20 comments
Make rtcd.pl aware of POWER8 Altivec/VSX.
Hello @lu-zero,
I have created a proof-of-concept for this issue, for now the POC already detects when is a powerpc, if have altivec and vsx then apply the gcc flags, also detects at run-time if the Linux has altivec support.
So far so good
I was looking at the table AltiVec (GNU/GCC 4.8.2) data types from this wiki, and the GNU/Gcc flags will differ for -maltivec, -mvsx and -mcpu=powerpc8
So the PR must support all the extended datatypes from powerpc8?
Also from the wiki:
If the ISA 2.07 additions to the vector/scalar (power8-vector) instruction set are available, the following additional functions are available for both 32-bit and 64-bit targets.
32-bit support is needed?
Best regards
Personally I just care about VSX and I'll write the support only for VSX since it has current hardware for it and it makes the code much simpler regarding loads and store.
This bug list is more or less my todolist on what to do.
I doubt the extended datatypes would be be useful in vpx.
With that said the poc looks promising, will you complete it soon?
For sure!
I just need to test with some vsx specific code and with all things working well I will create a PR for this issue.
I had something halfway since I was planning to start implementing next week, but your version seems much more complete already.
My plan is to finish it before the end of the weekend
Hello again,
I removed the detection for AltiVec and did not add the specific flags for Power8, but I can do it if you think it is necessary.
Below is an example of how to add VSX functions in the build system, in case I have tested for vpx_dsp.
diff --git a/vpx_dsp/vpx_dsp.mk b/vpx_dsp/vpx_dsp.mk
index 9b62520..f343aca 100644
--- a/vpx_dsp/vpx_dsp.mk
+++ b/vpx_dsp/vpx_dsp.mk
@@ -297,6 +297,8 @@ DSP_SRCS-$(HAVE_SSE2) += x86/sad4d_sse2.asm
DSP_SRCS-$(HAVE_SSE2) += x86/sad_sse2.asm
DSP_SRCS-$(HAVE_SSE2) += x86/subtract_sse2.asm
+DSP_SRCS-$(HAVE_VSX) += powerpc/sum_squares_vsx.c
+
ifeq ($(CONFIG_VP9_HIGHBITDEPTH),yes)
DSP_SRCS-$(HAVE_SSE2) += x86/highbd_sad4d_sse2.asm
DSP_SRCS-$(HAVE_SSE2) += x86/highbd_sad_sse2.asm
diff --git a/vpx_dsp/vpx_dsp_rtcd_defs.pl b/vpx_dsp/vpx_dsp_rtcd_defs.pl
index fa62941..4115b9a 100644
--- a/vpx_dsp/vpx_dsp_rtcd_defs.pl
+++ b/vpx_dsp/vpx_dsp_rtcd_defs.pl
@@ -1014,7 +1014,7 @@ add_proto qw/void vpx_sad4x4x4d/, "const uint8_t *src_ptr, int src_stride, const
specialize qw/vpx_sad4x4x4d msa sse2/;
add_proto qw/uint64_t vpx_sum_squares_2d_i16/, "const int16_t *src, int stride, int size";
-specialize qw/vpx_sum_squares_2d_i16 sse2/;
+specialize qw/vpx_sum_squares_2d_i16 sse2 vsx/;
#
# Structured Similarity (SSIM)
It is the same pattern to other parts of the project, I await your reply.
Best regards
Just to make sure, did you do all the CLA dance with upstream?
Doin it :)
Pushing to upstream https://chromium-review.googlesource.com/c/450877/
Hello @edelsohn, I've some questions from members of WebM community:
are there any cases of processors without vsx where you would want this to run? if not, it would be useful to set this up like x86_64 with sse2, where the sse2 function replaces the c version, allowing the linker to strip out the c version.
Also,
Do you anticipate options besides vsx? If not, you can leave out most of this. It is for cases where you had, for example, extension1 and extension2, so that --disable-extension1 also disabled extension2. Since you have only vsx, that is not necessary.
This patch adds support for Power8 VSX instructions, Is there an intention to support older versions?
Kind regards
Hi, @rafaeldelucena
What are the options? I don't know the details of the x86-64 setup with SSE2. Does it completely fail if SSE2 is not present?
The focus is systems that assume a minimum ISA with VSX. I don't expect the code to dynamically test for the feature at runtime. I expected that the library would detect or be configured for VSX when it is built. I sort of assumed that it would fall back to the current, simple C code if not configured for VSX, as opposed to completely failing to build. But a library built with the assumption of VSX is not expected to run on a hardware at an earlier ISA level.
For options other than VSX, I don't expect support for the original Altivec/VMX. Power9 now is public and adds a few more VSX instructions. I don't know if those could be useful for VPX. I would like the library to be able to leverage any future ISA improvements, when useful.
Does that answer your question? I don't understand exactly what you are asking.
Does that answer your question? I don't understand exactly what you are asking.
Yes, this answered
What are the options? I don't know the details of the x86-64 setup with SSE2. Does it completely fail if SSE2 is not present?
If for any reason doesn't have the VSX support will build with the C version.
The focus is systems that assume a minimum ISA with VSX. I don't expect the code to dynamically test for the feature at runtime.
Ok, thank you for the reply
They (WebM community members) have some questions about what versions of powerpc (ppc64, ppc64le and ppc (non-64)) they must support, do you mind if I add you on the discussion?
I already had subscribed to the patch review and responded on the thread.
Thanks! :)
Google and the WebM community have released some data and high-level benchmarks for VP9 (and H.264).
http://downloads.webmproject.org/yt_testclips
The USAGE file explains the high-level measurements, but doesn't
provide any new unit benchmarks. Profiling should highlight the
critical paths.
A previous report provided a profile stack:
18.25% ffmpeg.llvm ffmpeg.llvm [.] convolve_horiz
17.70% ffmpeg.llvm ffmpeg.llvm [.] convolve_vert
7.78% ffmpeg.llvm ffmpeg.llvm [.] vpx_sad64x64x4d_c
4.99% ffmpeg.llvm ffmpeg.llvm [.] vpx_sad32x32x4d_c
4.59% ffmpeg.llvm ffmpeg.llvm [.] vpx_sad16x16x4d_c
2.98% ffmpeg.llvm ffmpeg.llvm [.] vpx_sub_pixel_variance64x64_c
2.97% ffmpeg.llvm ffmpeg.llvm [.] vpx_sub_pixel_variance32x32_c
2.88% ffmpeg.llvm ffmpeg.llvm [.] vpx_sub_pixel_variance16x16_c
2.76% ffmpeg.llvm ffmpeg.llvm [.] vpx_sad8x8_c
2.73% ffmpeg.llvm ffmpeg.llvm [.] vpx_variance32x32_c
2.70% ffmpeg.llvm ffmpeg.llvm [.] vpx_fdct32
2.12% ffmpeg.llvm ffmpeg.llvm [.] vpx_quantize_b_c
1.76% ffmpeg.llvm ffmpeg.llvm [.] vpx_variance16x16_c
1.53% ffmpeg.llvm ffmpeg.llvm [.] vpx_variance8x8_c
1.31% ffmpeg.llvm ffmpeg.llvm [.] vpx_fdct32x32_rd_c
1.16% ffmpeg.llvm ffmpeg.llvm [.] vpx_sub_pixel_variance8x8_c
1.14% ffmpeg.llvm ffmpeg.llvm [.] vpx_fdct16x16_c
1.08% ffmpeg.llvm ffmpeg.llvm [.] vpx_subtract_block_c
merged upstream eventually.