Cannot enable GPU implementation for `NearestNeighbors(algorithm="brute")`

Question

Cannot enable GPU implementation for `NearestNeighbors(algorithm="brute")`

ogrisel opened this issue a year ago · 3 comments

To Reproduce

I copied the snippet from the documentation for DBSCAN and it seems to work as expected:

>>> import numpy as np
>>> from sklearnex import patch_sklearn, config_context
... patch_sklearn()
... 
... from sklearn.cluster import DBSCAN
... 
... X = np.array([[1., 2.], [2., 2.], [2., 3.],
...             [8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
... with config_context(target_offload="gpu:0"):
...    clustering = DBSCAN(eps=3, min_samples=2).fit(X)
... 
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
INFO:sklearnex: sklearn.cluster.DBSCAN.fit: running accelerated version on GPU

but then, swapping DBSCAN for NearestNeighbors does not seem to work (it still uses the CPU despite the config_context:

>>> from sklearn.neighbors import NearestNeighbors
>>> with config_context(target_offload="gpu:0"):
...     NearestNeighbors(algorithm="brute", n_neighbors=2).fit(X).kneighbors(X)
... 
INFO:sklearnex: sklearn.neighbors.NearestNeighbors.fit: running accelerated version on CPU
INFO:sklearnex: sklearn.neighbors.NearestNeighbors.kneighbors: running accelerated version on CPU

Expected behavior

I would have expected the message to show that it can dispatch on GPU because oneDAL seem to have specialized GPU implementation for this algorithm:

uxlfoundation/oneDAL#705

Environment:

OS: Linux idc-beta-batch-pvc-node-04 5.15.0-79-generic #86-Ubuntu SMP Mon Jul 10 16:07:21 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux (Intel devcloud)
Packages installed from the intel conda channel.

There is a GPU reachable by Python on this machine:

>>> import dpctl                                                                                                                                                                                                  
>>> dpctl.get_devices()                                                                                                                                                                                           
[<dpctl.SyclDevice [backend_type.opencl, device_type.cpu,  Intel(R) Xeon(R) Platinum 8480+] at 0x152b985cbe30>, <dpctl.SyclDevice [backend_type.opencl, device_type.accelerator,  Intel(R) FPGA Emulation Device] 
at 0x152b985cb2b0>, <dpctl.SyclDevice [backend_type.level_zero, device_type.gpu,  Intel(R) Data Center GPU Max 1100] at 0x152b985c8cf0>]

Answer 1 · 2023-08-28T09:29:32.000Z

Thank you @ogrisel for the report! I have already reproduced it as well.
Will investigate and share asap

Answer 2 · 2023-08-30T09:11:22.000Z

@ogrisel thank you again for your report. So issues itself on the verbose messenger side. Actually it does run it on accelerated version on GPU. Fix for the messenger will be patched.

Answer 3 · 2023-08-30T11:45:34.000Z

Thanks. I haven't timed in yet (on a medium scale dataset). Do you know what kind of speedup one should typically observe say on 1e6 data points at fit and 1e3 at predict with 300 dimensions? Say max series GPU vs 64 physical cores CPU?