Not actually using iGPU? [Perf]
wbrickner opened this issue · 2 comments
Description
Hello, I am using arrayfire-rust
to speed up a large matrix multiplication.
There seems to be zero performance difference when using the Backend::CPU
and Backend::OpenCL
(with set_device(0)
),
no matter how large the matrices are. I suspect that the Intel Iris GPU is not actually being used. This is basically confirmed by looking at Activity Monitor or iStats, which both report 100% CPU usage and approximately zero GPU usage.
Is this instead an issue of naming? My iGPU is technically part of my CPU, I suppose, although arrayfire::info()
clearly logically separates it:
ArrayFire v3.8.2 (OpenCL, 64-bit Mac OSX, build 5752f2dcc)
[0] APPLE: Intel(R) Iris(TM) Plus Graphics 655, 1536 MB
and verifying the backend and device give expected results:
Active backend: OPENCL
Active device: 0
I think the CPU and iGPU have a unified memory architecture, and I see no way to account for that in arrayfire-rust
. Could this be the issue? Like, I'm wondering if it could be that there needs to be a special memory allocation with particular alignment in order to compel OpenCL to actually leverage the GPU?
I am creating my matrices with randn::<f32>(...)
, and also constructing from a slice of floats that exist in host (CPU) memory.
The dimensions of my matrix multiply is [4096, 4096, 1, 1] x [4096, 14336, 1, 1]
.
I downloaded Arrayfire from official website.
Backend is OpenCL provided by Apple, I think.
Is there any other way to check if the GPU is actually being used?
The rust code I'm using at startup:
arrayfire::info();
arrayfire::set_backend(arrayfire::Backend::OPENCL);
arrayfire::set_device(0);
println!("Active backend: {:?}", arrayfire::get_active_backend());
println!("Active device: {}", arrayfire::get_device());
and to leverage the GPU:
// covariance transform matrix (extract from old representation)
let buffer = {
let ct = state.cov_transform();
let mut buff = F32_BUFFER_POOL.pull(|| Vec::with_capacity(ct.len()));
buff.resize(ct.len(), 0.0);
// copy into buffer (nalgebra promises it is column-major, so this is safe)
ct.as_slice().iter()
.zip(buff.iter_mut())
.for_each(|(source, target)| *target = *source as f32);
buff
};
let dims = dim4!(state.cov_transform().ncols() as u64, state.cov_transform().nrows() as u64);
let covariance_transform = Array::new(
&buffer[..],
dims
);
// i.i.d normally-distributed matrix (columns, then rows)
let random_matrix = randn::<f32>(dim4!(self.dim as u64, self.population_size as u64));
// matrix, the columns of which are our transformed vectors.
let transformed_vectors = matmul(&covariance_transform, &random_matrix, MatProp::NONE, MatProp::NONE);
I am a graphics programming novice, sorry if this is a silly problem!
Checklist
- I have read timing ArrayFire C++ API
Edit:
As a control, I've used gpu.rocks to show that the GPU utilization is correct (spikes briefly to 100% while benchmarking). I still get ~2% utilization through arrayfire
. The gpu.rocks
benchmark claims 106ms
to compute a large matrix-matrix multiply (4096
square matrices). My performance through arrayfire
seems to be about 3.7s
. My matrix multiply is larger through arrayfire
, however.
$ AF_TRACE=all AF_SHOW_LOAD_PATH=1 cargo bench big_bench
yields:
View Output
[unified][1655251166][14119300] [ /Users/umar/devel/arrayfire/src/api/unified/symbol_manager.cpp:141 ] Attempting: Default System Paths
[unified][1655251166][14119300] [ /Users/umar/devel/arrayfire/src/api/unified/symbol_manager.cpp:144 ] Found: libafcpu.3.dylib
[unified][1655251166][14119300] [ /Users/umar/devel/arrayfire/src/api/unified/symbol_manager.cpp:151 ] Device Count: 1.
Using libafcpu.3.dylib
[unified][1655251166][14119300] [ /Users/umar/devel/arrayfire/src/api/unified/symbol_manager.cpp:141 ] Attempting: Default System Paths
[unified][1655251166][14119300] [ /Users/umar/devel/arrayfire/src/api/unified/symbol_manager.cpp:144 ] Found: libafopencl.3.dylib
[platform][1655251166][14119300] [ /Users/umar/devel/arrayfire/src/backend/common/DependencyModule.cpp:99 ] Attempting to load: libforge.dylib
[platform][1655251166][14119300] [ /Users/umar/devel/arrayfire/src/backend/common/DependencyModule.cpp:104 ] Unable to open forge
[platform][1655251166][14119300] [ /Users/umar/devel/arrayfire/src/backend/opencl/device_manager.cpp:218 ] Found 1 OpenCL platforms
[platform][1655251166][14119300] [ /Users/umar/devel/arrayfire/src/backend/opencl/device_manager.cpp:230 ] Found 1 devices on platform Apple
[platform][1655251166][14119300] [ /Users/umar/devel/arrayfire/src/backend/opencl/device_manager.cpp:235 ] Found device Intel(R) Iris(TM) Plus Graphics 655 on platform Apple
[platform][1655251166][14119300] [ /Users/umar/devel/arrayfire/src/backend/opencl/device_manager.cpp:240 ] Found 1 OpenCL devices
[platform][1655251166][14119300] [ /Users/umar/devel/arrayfire/src/backend/opencl/device_manager.cpp:335 ] Default device: 0
[unified][1655251166][14119300] [ /Users/umar/devel/arrayfire/src/api/unified/symbol_manager.cpp:151 ] Device Count: 1.
Using libafopencl.3.dylib
[unified][1655251166][14119300] [ /Users/umar/devel/arrayfire/src/api/unified/symbol_manager.cpp:141 ] Attempting: Default System Paths
[unified][1655251166][14119300] [ /Users/umar/devel/arrayfire/src/api/unified/symbol_manager.cpp:162 ] Failed to load dlopen(libafcuda.3.dylib, 0x0001): Library not loaded: /usr/local/cuda/lib/libcuda.dylib
Referenced from: /opt/arrayfire/lib/libafcuda.3.8.2.dylib
Reason: tried: '/usr/local/cuda/lib/libcuda.dylib' (no such file), '/Users/wbrickner/Documents/Projects/cmaes/target/x86_64-apple-darwin/release/deps/libcuda.dylib' (no such file), '/Users/wbrickner/Documents/Projects/cmaes/target/x86_64-apple-darwin/release/libcuda.dylib' (no such file), '/Users/wbrickner/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/libcuda.dylib' (no such file), '/Users/wbrickner/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/libcuda.dylib' (no such file), '/Users/wbrickner/lib/libcuda.dylib' (no such file), '/usr/local/lib/libcuda.dylib' (no such file), '/usr/lib/libcuda.dylib' (no such file)
[unified][1655251166][14119300] [ /Users/umar/devel/arrayfire/src/api/unified/symbol_manager.cpp:141 ] Attempting: .
[unified][1655251166][14119300] [ /Users/umar/devel/arrayfire/src/api/unified/symbol_manager.cpp:162 ] Failed to load dlopen(./libafcuda.3.dylib, 0x0001): Library not loaded: /usr/local/cuda/lib/libcuda.dylib
Referenced from: /opt/arrayfire/lib/libafcuda.3.8.2.dylib
Reason: tried: '/usr/local/cuda/lib/libcuda.dylib' (no such file), '/Users/wbrickner/Documents/Projects/cmaes/target/x86_64-apple-darwin/release/deps/libcuda.dylib' (no such file), '/Users/wbrickner/Documents/Projects/cmaes/target/x86_64-apple-darwin/release/libcuda.dylib' (no such file), '/Users/wbrickner/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/libcuda.dylib' (no such file), '/Users/wbrickner/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/libcuda.dylib' (no such file), '/Users/wbrickner/lib/libcuda.dylib' (no such file), '/usr/local/lib/libcuda.dylib' (no such file), '/usr/lib/libcuda.dylib' (no such file)
[unified][1655251166][14119300] [ /Users/umar/devel/arrayfire/src/api/unified/symbol_manager.cpp:141 ] Attempting: ./src/backend/cuda
[unified][1655251166][14119300] [ /Users/umar/devel/arrayfire/src/api/unified/symbol_manager.cpp:162 ] Failed to load dlopen(./src/backend/cuda/libafcuda.3.dylib, 0x0001): tried: '/opt/arrayfire/lib/./src/backend/cuda/libafcuda.3.dylib' (no such file), './src/backend/cuda/libafcuda.3.dylib' (no such file)
[unified][1655251166][14119300] [ /Users/umar/devel/arrayfire/src/api/unified/symbol_manager.cpp:141 ] Attempting: ../src/backend/cuda
[unified][1655251166][14119300] [ /Users/umar/devel/arrayfire/src/api/unified/symbol_manager.cpp:162 ] Failed to load dlopen(../src/backend/cuda/libafcuda.3.dylib, 0x0001): tried: '/opt/arrayfire/lib/../src/backend/cuda/libafcuda.3.dylib' (no such file), '../src/backend/cuda/libafcuda.3.dylib' (no such file)
[unified][1655251166][14119300] [ /Users/umar/devel/arrayfire/src/api/unified/symbol_manager.cpp:141 ] Attempting: src/backend/cuda
[unified][1655251166][14119300] [ /Users/umar/devel/arrayfire/src/api/unified/symbol_manager.cpp:162 ] Failed to load dlopen(src/backend/cuda/libafcuda.3.dylib, 0x0001): tried: '/opt/arrayfire/lib/src/backend/cuda/libafcuda.3.dylib' (no such file), 'src/backend/cuda/libafcuda.3.dylib' (no such file)
[unified][1655251166][14119300] [ /Users/umar/devel/arrayfire/src/api/unified/symbol_manager.cpp:141 ] Attempting: lib
[unified][1655251166][14119300] [ /Users/umar/devel/arrayfire/src/api/unified/symbol_manager.cpp:162 ] Failed to load dlopen(lib/libafcuda.3.dylib, 0x0001): tried: '/opt/arrayfire/lib/lib/libafcuda.3.dylib' (no such file), 'lib/libafcuda.3.dylib' (no such file)
[unified][1655251166][14119300] [ /Users/umar/devel/arrayfire/src/api/unified/symbol_manager.cpp:141 ] Attempting: lib64
[unified][1655251166][14119300] [ /Users/umar/devel/arrayfire/src/api/unified/symbol_manager.cpp:162 ] Failed to load dlopen(lib64/libafcuda.3.dylib, 0x0001): tried: '/opt/arrayfire/lib/lib64/libafcuda.3.dylib' (no such file), 'lib64/libafcuda.3.dylib' (no such file)
[unified][1655251166][14119300] [ /Users/umar/devel/arrayfire/src/api/unified/symbol_manager.cpp:141 ] Attempting: Default System Paths
[unified][1655251166][14119300] [ /Users/umar/devel/arrayfire/src/api/unified/symbol_manager.cpp:162 ] Failed to load dlopen(libafcuda.3.dylib, 0x0001): Library not loaded: /usr/local/cuda/lib/libcuda.dylib
Referenced from: /opt/arrayfire/lib/libafcuda.3.8.2.dylib
Reason: tried: '/usr/local/cuda/lib/libcuda.dylib' (no such file), '/Users/wbrickner/Documents/Projects/cmaes/target/x86_64-apple-darwin/release/deps/libcuda.dylib' (no such file), '/Users/wbrickner/Documents/Projects/cmaes/target/x86_64-apple-darwin/release/libcuda.dylib' (no such file), '/Users/wbrickner/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/libcuda.dylib' (no such file), '/Users/wbrickner/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/libcuda.dylib' (no such file), '/Users/wbrickner/lib/libcuda.dylib' (no such file), '/usr/local/lib/libcuda.dylib' (no such file), '/usr/lib/libcuda.dylib' (no such file)
[unified][1655251166][14119300] [ /Users/umar/devel/arrayfire/src/api/unified/symbol_manager.cpp:141 ] Attempting: /opt/arrayfire-3/lib/
[unified][1655251166][14119300] [ /Users/umar/devel/arrayfire/src/api/unified/symbol_manager.cpp:162 ] Failed to load dlopen(/opt/arrayfire-3/lib//libafcuda.3.dylib, 0x0001): tried: '/opt/arrayfire-3/lib//libafcuda.3.dylib' (no such file)
[unified][1655251166][14119300] [ /Users/umar/devel/arrayfire/src/api/unified/symbol_manager.cpp:141 ] Attempting: /opt/arrayfire/lib/
[unified][1655251166][14119300] [ /Users/umar/devel/arrayfire/src/api/unified/symbol_manager.cpp:162 ] Failed to load dlopen(/opt/arrayfire/lib//libafcuda.3.dylib, 0x0001): Library not loaded: /usr/local/cuda/lib/libcuda.dylib
Referenced from: /opt/arrayfire/lib/libafcuda.3.8.2.dylib
Reason: tried: '/usr/local/cuda/lib/libcuda.dylib' (no such file), '/Users/wbrickner/Documents/Projects/cmaes/target/x86_64-apple-darwin/release/deps/libcuda.dylib' (no such file), '/Users/wbrickner/Documents/Projects/cmaes/target/x86_64-apple-darwin/release/libcuda.dylib' (no such file), '/Users/wbrickner/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/libcuda.dylib' (no such file), '/Users/wbrickner/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/libcuda.dylib' (no such file), '/Users/wbrickner/lib/libcuda.dylib' (no such file), '/usr/local/lib/libcuda.dylib' (no such file), '/usr/lib/libcuda.dylib' (no such file)
[unified][1655251166][14119300] [ /Users/umar/devel/arrayfire/src/api/unified/symbol_manager.cpp:141 ] Attempting: /usr/local/lib/
[unified][1655251166][14119300] [ /Users/umar/devel/arrayfire/src/api/unified/symbol_manager.cpp:162 ] Failed to load dlopen(/usr/local/lib//libafcuda.3.dylib, 0x0001): Library not loaded: /usr/local/cuda/lib/libcuda.dylib
Referenced from: /opt/arrayfire/lib/libafcuda.3.8.2.dylib
Reason: tried: '/usr/local/cuda/lib/libcuda.dylib' (no such file), '/Users/wbrickner/Documents/Projects/cmaes/target/x86_64-apple-darwin/release/deps/libcuda.dylib' (no such file), '/Users/wbrickner/Documents/Projects/cmaes/target/x86_64-apple-darwin/release/libcuda.dylib' (no such file), '/Users/wbrickner/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/x86_64-apple-darwin/lib/libcuda.dylib' (no such file), '/Users/wbrickner/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/libcuda.dylib' (no such file), '/Users/wbrickner/lib/libcuda.dylib' (no such file), '/usr/local/lib/libcuda.dylib' (no such file), '/usr/lib/libcuda.dylib' (no such file)
[unified][1655251166][14119300] [ /Users/umar/devel/arrayfire/src/api/unified/symbol_manager.cpp:141 ] Attempting: /usr/local/arrayfire/lib/
[unified][1655251166][14119300] [ /Users/umar/devel/arrayfire/src/api/unified/symbol_manager.cpp:162 ] Failed to load dlopen(/usr/local/arrayfire/lib//libafcuda.3.dylib, 0x0001): tried: '/usr/local/arrayfire/lib//libafcuda.3.dylib' (no such file)
[unified][1655251166][14119300] [ /Users/umar/devel/arrayfire/src/api/unified/symbol_manager.cpp:206 ] AF_DEFAULT_BACKEND: opencl
ArrayFire v3.8.2 (OpenCL, 64-bit Mac OSX, build 5752f2dcc)
[0] APPLE: Intel(R) Iris(TM) Plus Graphics 655, 1536 MB
Active backend: OPENCL
Active device: 0
Benchmarking big_bench: Warming up for 3.0000 sHello there, it is me again!
[mem][1655251167][14119300] [ /Users/umar/devel/arrayfire/src/backend/common/DefaultMemoryManager.cpp:127 ] memory[0].max_bytes: 1.12 GB
[mem][1655251167][14119300] [ /Users/umar/devel/arrayfire/src/backend/opencl/memory.cpp:200 ] nativeAlloc: 64 MB 0x6000021405a0
[mem][1655251167][14119300] [ /Users/umar/devel/arrayfire/src/backend/opencl/memory.cpp:200 ] nativeAlloc: 224 MB 0x600002150090
[jit][1655251167][14119300] [ /Users/umar/devel/arrayfire/src/backend/opencl/compile_module.cpp:254 ] {7487304373030023099 : loaded from /Users/wbrickner/.arrayfire/KER7487304373030023099_CL_16925952_INTEL(R)_IRIS(TM)_PLUS_GRAPHICS_655_AF_38.bin for Intel(R) Iris(TM) Plus Graphics 655 }
[kernel][1655251167][14119300] [ /Users/umar/devel/arrayfire/src/backend/opencl/Kernel.hpp:33 ] Launching philoxGenerator
[mem][1655251167][14119300] [ /Users/umar/devel/arrayfire/src/backend/opencl/memory.cpp:200 ] nativeAlloc: 224 MB 0x60000215c000
All done! Took Ok(3.367842s)
Interestingly, supplying AF_OPENCL_CPU_OFFLOAD=0
causes the GPU utilization to spike as I want, but it actually makes it run slower (3.43s
vs 4.48s
).
What's going on here? Why does webtech gpu.js
end up beating the performance of arrayfire
? Is this an apples-oranges comparison? Thank you.
@wbrickner I am sorry I couldn't get to this early, has the problem been resolved or did you find a work around ? Can you please share some details of your resolution. Thank you.
I think matrix size was too small, couldn't see gpu usage rise. Regardless of symptoms, my usage now clearly leverages GPU. Thank you