Cannot enable GPU implementation for `NearestNeighbors(algorithm="brute")`
ogrisel opened this issue · 3 comments
To Reproduce
I copied the snippet from the documentation for DBSCAN and it seems to work as expected:
>>> import numpy as np
>>> from sklearnex import patch_sklearn, config_context
... patch_sklearn()
...
... from sklearn.cluster import DBSCAN
...
... X = np.array([[1., 2.], [2., 2.], [2., 3.],
... [8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
... with config_context(target_offload="gpu:0"):
... clustering = DBSCAN(eps=3, min_samples=2).fit(X)
...
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
INFO:sklearnex: sklearn.cluster.DBSCAN.fit: running accelerated version on GPU
but then, swapping DBSCAN
for NearestNeighbors
does not seem to work (it still uses the CPU despite the config_context
:
>>> from sklearn.neighbors import NearestNeighbors
>>> with config_context(target_offload="gpu:0"):
... NearestNeighbors(algorithm="brute", n_neighbors=2).fit(X).kneighbors(X)
...
INFO:sklearnex: sklearn.neighbors.NearestNeighbors.fit: running accelerated version on CPU
INFO:sklearnex: sklearn.neighbors.NearestNeighbors.kneighbors: running accelerated version on CPU
Expected behavior
I would have expected the message to show that it can dispatch on GPU because oneDAL seem to have specialized GPU implementation for this algorithm:
Environment:
- OS: Linux idc-beta-batch-pvc-node-04 5.15.0-79-generic #86-Ubuntu SMP Mon Jul 10 16:07:21 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux (Intel devcloud)
- Packages installed from the
intel
conda channel.
There is a GPU reachable by Python on this machine:
>>> import dpctl
>>> dpctl.get_devices()
[<dpctl.SyclDevice [backend_type.opencl, device_type.cpu, Intel(R) Xeon(R) Platinum 8480+] at 0x152b985cbe30>, <dpctl.SyclDevice [backend_type.opencl, device_type.accelerator, Intel(R) FPGA Emulation Device]
at 0x152b985cb2b0>, <dpctl.SyclDevice [backend_type.level_zero, device_type.gpu, Intel(R) Data Center GPU Max 1100] at 0x152b985c8cf0>]
Thank you @ogrisel for the report! I have already reproduced it as well.
Will investigate and share asap
@ogrisel thank you again for your report. So issues itself on the verbose messenger side. Actually it does run it on accelerated version on GPU. Fix for the messenger will be patched.
Thanks. I haven't timed in yet (on a medium scale dataset). Do you know what kind of speedup one should typically observe say on 1e6 data points at fit and 1e3 at predict with 300 dimensions? Say max series GPU vs 64 physical cores CPU?