Device detected by clinfo but not by tensorflow
InonS opened this issue · 7 comments
I'm not sure if this is a "supported architectures" issue, or if there are more details I should give. What do you think?
$ ./run_tests.sh
+ cd examples
+ pushd 2_BasicModels
/.../TensorFlow-Examples/examples/2_BasicModels /.../TensorFlow-Examples/examples
+ python linear_regression.py
gpu_manager->VisibleDeviceCount is 0
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 972, in _do_call
return fn(*args)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 950, in _run_fn
self._extend_graph()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 999, in _extend_graph
self._session, graph_def.SerializeToString(), status)
File "/usr/lib/python3.5/contextlib.py", line 66, in __exit__
next(self.gen)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors.py", line 463, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors.InvalidArgumentError: Cannot assign a device to node 'GradientDescent/learning_rate': Could not satisfy explicit device specification '/device:GPU:0' because no devices matching that specification are registered in this process; available devices: /job:localhost/replica:0/task:0/cpu:0
[[Node: GradientDescent/learning_rate = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [] values: 0.01>, _device="/device:GPU:0"]()]]
$ clinfo
Number of platforms 1
Platform Name Intel(R) OpenCL
Platform Vendor Intel(R) Corporation
Platform Version OpenCL 1.2 LINUX
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_fp64
Platform Extensions function suffix INTEL
Platform Name Intel(R) OpenCL
Number of devices 1
Device Name Intel(R) Core(TM) i7-4710HQ CPU @ 2.50GHz
Device Vendor Intel(R) Corporation
Device Vendor ID 0x8086
Device Version OpenCL 1.2 (Build 25)
Driver Version 1.2.0.25
Device OpenCL C Version OpenCL C 1.2
Device Type CPU
Device Profile FULL_PROFILE
Max compute units 2
Max clock frequency 2500MHz
Device Partition (core)
Max number of sub-devices 2
Supported partition types by counts, equally, by names (Intel)
Max work item dimensions 3
Max work item sizes 8192x8192x8192
Max work group size 8192
Preferred work group size multiple 128
Preferred / native vector sizes
char 1 / 16
short 1 / 8
int 1 / 4
long 1 / 2
half 0 / 0 (n/a)
float 1 / 8
double 1 / 4 (cl_khr_fp64)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Address bits 64, Little-Endian
Global memory size 4142542848 (3.858GiB)
Error Correction support No
Max memory allocation 1035635712 (987.7MiB)
Unified memory for Host and Device Yes
Minimum alignment for any data type 128 bytes
Alignment of base address 1024 bits (128 bytes)
Global Memory cache type Read/Write
Global Memory cache size 262144
Global Memory cache line 64 bytes
Image support Yes
Max number of samplers per kernel 480
Max size for 1D images from buffer 64727232 pixels
Max 1D or 2D image array size 2048 images
Max 2D image size 16384x16384 pixels
Max 3D image size 2048x2048x2048 pixels
Max number of read image args 480
Max number of write image args 480
Local memory type Global
Local memory size 32768 (32KiB)
Max constant buffer size 131072 (128KiB)
Max number of constant args 480
Max size of kernel argument 3840 (3.75KiB)
Queue properties
Out-of-order execution Yes
Profiling Yes
Local thread execution (Intel) Yes
Prefer user sync for interop No
Profiling timer resolution 1ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels Yes
SPIR versions 1.2
printf() buffer size 1048576 (1024KiB)
Built-in kernels
Device Available Yes
Compiler Available Yes
Linker Available Yes
Device Extensions cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_fp64
NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) No platform
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) No platform
clCreateContext(NULL, ...) [default] No platform
clCreateContext(NULL, ...) [other] Success [INTEL]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) No platform
You need a device with Device Type
set to GPU
. Yours says CPU
?
Oh. Youre the guy that posted a LinkedIn message that was "like"d by thousands :)
Welcome :)
Ok, so, modern Intel CPUs often contain GPUs. Which is not the 'CPU' bit of the CPU itself, but an extra component, inside the CPU. It's a bit confusing :-P
So, it turns out that your cpu does in fact have a GPU inside it, it's an HD4600: https://ark.intel.com/products/78930/Intel-Core-i7-4710HQ-Processor-6M-Cache-up-to-3_50-GHz
Then the next question is: does your HD4600 GPU, inside your 4710HQ CPU, support OpenCL? The page above doesnt say. But it sounds modernish: I used to have an HD4000, and that supported OpenCL, so let's see...
Googling for 'wikipedia hd', we get https://en.wikipedia.org/wiki/Intel_HD_and_Iris_Graphics#Capabilities . This shows that Haswell cpus have OpenCL 1.2 GPUs:
At this point, I conclude: you're missing the driver :) . You probably need to install a driver from the Intel website, eg something like ...hmmm... I can only find for windows https://downloadcenter.intel.com/product/97501/Graphics-for-5th-Generation-Intel-Processors . oh right: you need Beignet :)
(Note that you're very late to the party though; I'm working on other things at the moment; meanwhile tf-coriander is missing a cudnn replacement. I started working on one at https://github.com/hughperkins/coriander-dnn , but never quite got round to plugging it in to tf-coriander. I dont think it's tons of work. If I did have a moment, it'd probably take me ~40-80 hours. But of course if someone else does it, they have to learn everything from scratch, so could easily be 4-8 times that.
If you can find someone who could be interested in helping with that, I'm happy to assist them with knowledge acquisition, meet them in Hangouts etc.
The impact of not having cudnn currently is that convolutions run on CPU (not the GPU part of the CPU, the CPU bit).
)
Yeah, the response to that post surprised me too. I was so upset that
TensorFlow doesn't support an open standard for GPGPU!
:)
Yes, impressive to write something that went so viral :)
I guess I need to re-check my driver installation. I tried using
Ubuntu-based dockers, which is where I got the stacktrace and clinfo output
in my original post.
Docker is not very OpenCL/GPU friendly. Docker does work ... with NVIDIA GPUs :-P . I'm not saying Docker cant be tweaked to work with AMD GPUs, but I've never heard of that being possible. For NVIDIA GPUs, you need some special additional drivers, eg https://github.com/NVIDIA/nvidia-docker , or at least pass the drivers through, using --device
option to Docker, like https://hub.docker.com/r/hughperkins/cltorch-nvidia/ But I've never heard of this being possible for AMD GPUs.
Your easiest options for AMD GPUs will probably be to use the AMD GPU directly from your OS, so one of:
- install an OS that supports the AMD GPU drivers, and /or
- find AMD GPU drivers for your current main OS
If it was me, well .... so... I got into OpenCL, since I had a laptop with an Intel CPU, with an HD4000 inside, and I thought it was so cool that the CPU had a GPU inside, and wanted to play, and of course it wont work with CUDA, so I wrote https://github.com/hughperkins/DeepCL from scratch, incrementally, over ~6 months, so that I could play with using the HD4000 GPU :)
Later on though, I found that the Intel GPU, whilst fun, is not something I'd ever train an ml model on: aws works well for that, or at least, an NVIDIA GPU. There are no AMD cloud-enabled GPUs around that I can find.
Currently, I think that whilst it'd be good to have competition for NVDIA GPUs, to keep them on their toes, I'm not sure that AMD will be that competition, at least, not in a big way. I think that something like the Nervana TPUs might be a more realistic competition possibly? https://cloud.google.com/blog/big-data/2017/05/an-in-depth-look-at-googles-first-tensor-processing-unit-tpu