Update rtcd.pl

Question

Update rtcd.pl

lu-zero opened this issue 8 years ago · 20 comments

Make rtcd.pl aware of POWER8 Altivec/VSX.

Answer 1 · 2016-10-27T19:16:24.000Z

I have created a proof-of-concept for this issue, for now the POC already detects when is a powerpc, if have altivec and vsx then apply the gcc flags, also detects at run-time if the Linux has altivec support.

So far so good

I was looking at the table AltiVec (GNU/GCC 4.8.2) data types from this wiki, and the GNU/Gcc flags will differ for -maltivec, -mvsx and -mcpu=powerpc8

So the PR must support all the extended datatypes from powerpc8?

Also from the wiki:

If the ISA 2.07 additions to the vector/scalar (power8-vector) instruction set are available, the following additional functions are available for both 32-bit and 64-bit targets.

32-bit support is needed?

Best regards

Answer 2 · 2016-10-27T20:01:59.000Z

Personally I just care about VSX and I'll write the support only for VSX since it has current hardware for it and it makes the code much simpler regarding loads and store.

This bug list is more or less my todolist on what to do.

I doubt the extended datatypes would be be useful in vpx.

Answer 3 · 2016-10-28T11:41:20.000Z

With that said the poc looks promising, will you complete it soon?

Answer 4 · 2016-10-28T17:55:09.000Z

For sure!

I just need to test with some vsx specific code and with all things working well I will create a PR for this issue.

Answer 5 · 2016-10-28T18:08:29.000Z

I had something halfway since I was planning to start implementing next week, but your version seems much more complete already.

Answer 6 · 2016-10-28T19:33:44.000Z

My plan is to finish it before the end of the weekend

Answer 7 · 2016-10-31T01:41:16.000Z

Hello again,

I removed the detection for AltiVec and did not add the specific flags for Power8, but I can do it if you think it is necessary.

Below is an example of how to add VSX functions in the build system, in case I have tested for vpx_dsp.

diff --git a/vpx_dsp/vpx_dsp.mk b/vpx_dsp/vpx_dsp.mk
index 9b62520..f343aca 100644
--- a/vpx_dsp/vpx_dsp.mk
+++ b/vpx_dsp/vpx_dsp.mk
@@ -297,6 +297,8 @@ DSP_SRCS-$(HAVE_SSE2)   += x86/sad4d_sse2.asm
 DSP_SRCS-$(HAVE_SSE2)   += x86/sad_sse2.asm
 DSP_SRCS-$(HAVE_SSE2)   += x86/subtract_sse2.asm

+DSP_SRCS-$(HAVE_VSX)   += powerpc/sum_squares_vsx.c
+
 ifeq ($(CONFIG_VP9_HIGHBITDEPTH),yes)
 DSP_SRCS-$(HAVE_SSE2) += x86/highbd_sad4d_sse2.asm
 DSP_SRCS-$(HAVE_SSE2) += x86/highbd_sad_sse2.asm
diff --git a/vpx_dsp/vpx_dsp_rtcd_defs.pl b/vpx_dsp/vpx_dsp_rtcd_defs.pl
index fa62941..4115b9a 100644
--- a/vpx_dsp/vpx_dsp_rtcd_defs.pl
+++ b/vpx_dsp/vpx_dsp_rtcd_defs.pl
@@ -1014,7 +1014,7 @@ add_proto qw/void vpx_sad4x4x4d/, "const uint8_t *src_ptr, int src_stride, const
 specialize qw/vpx_sad4x4x4d msa sse2/;

 add_proto qw/uint64_t vpx_sum_squares_2d_i16/, "const int16_t *src, int stride, int size";
-specialize qw/vpx_sum_squares_2d_i16 sse2/;
+specialize qw/vpx_sum_squares_2d_i16 sse2 vsx/;

 #
 # Structured Similarity (SSIM)

It is the same pattern to other parts of the project, I await your reply.

Best regards

Answer 8 · 2016-11-07T15:45:46.000Z

Just to make sure, did you do all the CLA dance with upstream?

Answer 9 · 2016-11-07T20:36:32.000Z

Doin it :)

Answer 10 · 2017-03-07T00:32:29.000Z

Pushing to upstream https://chromium-review.googlesource.com/c/450877/

Answer 11 · 2017-03-07T01:45:12.000Z

Hello @edelsohn, I've some questions from members of WebM community:

are there any cases of processors without vsx where you would want this to run? if not, it would be useful to set this up like x86_64 with sse2, where the sse2 function replaces the c version, allowing the linker to strip out the c version.

Also,

Do you anticipate options besides vsx? If not, you can leave out most of this. It is for cases where you had, for example, extension1 and extension2, so that --disable-extension1 also disabled extension2. Since you have only vsx, that is not necessary.

This patch adds support for Power8 VSX instructions, Is there an intention to support older versions?

Kind regards

Answer 12 · 2017-03-07T02:23:16.000Z

Hi, @rafaeldelucena

What are the options? I don't know the details of the x86-64 setup with SSE2. Does it completely fail if SSE2 is not present?

The focus is systems that assume a minimum ISA with VSX. I don't expect the code to dynamically test for the feature at runtime. I expected that the library would detect or be configured for VSX when it is built. I sort of assumed that it would fall back to the current, simple C code if not configured for VSX, as opposed to completely failing to build. But a library built with the assumption of VSX is not expected to run on a hardware at an earlier ISA level.

For options other than VSX, I don't expect support for the original Altivec/VMX. Power9 now is public and adds a few more VSX instructions. I don't know if those could be useful for VPX. I would like the library to be able to leverage any future ISA improvements, when useful.

Does that answer your question? I don't understand exactly what you are asking.

Answer 13 · 2017-03-07T05:31:43.000Z

@edelsohn

Does that answer your question? I don't understand exactly what you are asking.

Yes, this answered

What are the options? I don't know the details of the x86-64 setup with SSE2. Does it completely fail if SSE2 is not present?

If for any reason doesn't have the VSX support will build with the C version.

The focus is systems that assume a minimum ISA with VSX. I don't expect the code to dynamically test for the feature at runtime.

Ok, thank you for the reply

Answer 14 · 2017-03-08T23:13:47.000Z

@edelsohn

It's done :)

https://chromium.googlesource.com/webm/libvpx/+/51289302ab02d81c17d3f15bbfb9a22eef4a36c1

Answer 15 · 2017-03-08T23:17:49.000Z

Hi, Rafael Excellent! Glad to see the initial support committed. Good work. Thanks, David

…

On Wed, Mar 8, 2017 at 6:13 PM, Rafael de Lucena Valle < ***@***.***> wrote: @edelsohn <https://github.com/edelsohn> It's done :) https://chromium.googlesource.com/webm/libvpx/+/ 5128930 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAowNP8aNkQciWbIJGjCTUKhm0J5SX_Zks5rjzY0gaJpZM4KZ5hI> .

Answer 16 · 2017-03-09T19:07:24.000Z

@edelsohn

They (WebM community members) have some questions about what versions of powerpc (ppc64, ppc64le and ppc (non-64)) they must support, do you mind if I add you on the discussion?

Answer 17 · 2017-03-09T19:23:00.000Z

I already had subscribed to the patch review and responded on the thread.

Answer 18 · 2017-03-09T19:24:35.000Z

Thanks! :)

Answer 19 · 2017-03-09T20:06:53.000Z

Google and the WebM community have released some data and high-level benchmarks for VP9 (and H.264).

http://downloads.webmproject.org/yt_testclips

The USAGE file explains the high-level measurements, but doesn't
provide any new unit benchmarks. Profiling should highlight the
critical paths.

A previous report provided a profile stack:

18.25% ffmpeg.llvm ffmpeg.llvm [.] convolve_horiz
17.70% ffmpeg.llvm ffmpeg.llvm [.] convolve_vert
7.78% ffmpeg.llvm ffmpeg.llvm [.] vpx_sad64x64x4d_c
4.99% ffmpeg.llvm ffmpeg.llvm [.] vpx_sad32x32x4d_c
4.59% ffmpeg.llvm ffmpeg.llvm [.] vpx_sad16x16x4d_c
2.98% ffmpeg.llvm ffmpeg.llvm [.] vpx_sub_pixel_variance64x64_c
2.97% ffmpeg.llvm ffmpeg.llvm [.] vpx_sub_pixel_variance32x32_c
2.88% ffmpeg.llvm ffmpeg.llvm [.] vpx_sub_pixel_variance16x16_c
2.76% ffmpeg.llvm ffmpeg.llvm [.] vpx_sad8x8_c
2.73% ffmpeg.llvm ffmpeg.llvm [.] vpx_variance32x32_c
2.70% ffmpeg.llvm ffmpeg.llvm [.] vpx_fdct32
2.12% ffmpeg.llvm ffmpeg.llvm [.] vpx_quantize_b_c
1.76% ffmpeg.llvm ffmpeg.llvm [.] vpx_variance16x16_c
1.53% ffmpeg.llvm ffmpeg.llvm [.] vpx_variance8x8_c
1.31% ffmpeg.llvm ffmpeg.llvm [.] vpx_fdct32x32_rd_c
1.16% ffmpeg.llvm ffmpeg.llvm [.] vpx_sub_pixel_variance8x8_c
1.14% ffmpeg.llvm ffmpeg.llvm [.] vpx_fdct16x16_c
1.08% ffmpeg.llvm ffmpeg.llvm [.] vpx_subtract_block_c

Answer 20 · 2017-03-18T11:09:30.000Z

merged upstream eventually.