python/pyperformance

Use more than one core

Eclips4 opened this issue · 6 comments

On the current moment, pyperformance loads a single core
Is there any reason why this is so?

I'd suggest to implement -j threads rather than running on all available cores. I'm using pyperformance for gathering profiling data in own weird Python PGO+LTO build.

@rockdrilla

I'm using pyperformance for gathering profiling data in own weird Python PGO+LTO build.

If you don't mind, would you like to introduce your use case?
These days I have an interest in increasing the coverage of profiling for PGO.

@corona10 it's pretty weird solution. 😄

In short: build Python with shared library, install "somewhere" pyperformance using "shared" Python, reconfigure Python for static binary and build it - it will run pyperformance while gathering PGO data.

applied patches (related for this case):

build script: debian/rules from package template

upd: benchmark results.

upd: benchmark results.

Wow supercool!

I am still conservative with direct supporting profiling workload based on pyperformance suite
but I am open to improving the current PGO and LTO for better performance.
(For example, thinLTO is fast but fullLTO based on GCC is slow if you don't pass the core count or auto flag)
or we can create a new configuration for designating the pre-gen profile directory, which can be used for external profiled data.

I'd suggest to using fixed core count rather than "auto" in -flto=X because I've seen a lot of times situation where (gnu) make launches up to N (in case of make -j N) jobs with gcc and each (!) gcc spawned up to N processes. It's confusing me a lot but container runtime confuses whole build process even more when running in a container with limited core count.

upd: you may use -fprofile-dir=path flag with gcc in order to separate PGO data from build directory.

I'd suggest to using fixed core count rather than "auto" in -flto=X because I've seen a lot of times situation where (gnu) make launches up to N (in case of make -j N) jobs with gcc and each (!) gcc spawned up to N processes

Okay, I agree with you. Let's pile the issue on the CPython and discuss the better way to solve it.
I prefer that we can use seamless ways to support it.