More QPU seems slower

Question

More QPU seems slower

Closed this issue 6 years ago · 7 comments

Hi, I tried your example on Tri and Multitri where one uses 1 QPU and another uses 4 QPU.
I calculate the time taken in running the kernel and found that the one uses 4 QPU took more time. Is this something expected?

Answer 1 · 2018-04-14T12:05:00.000Z

One more thing, when i do make QPU with number larger than 1, it does not seems to go to the function enableQPUs. By right it should enable the QPU and run the program on GPU right?

Answer 2 · 2018-04-16T03:15:29.000Z

@mn416 Hi, I may need your help here in trying to understand how can I get most performance out of QPULib

Answer 3 · 2018-04-16T09:17:45.000Z

Hi @darklord1310,

Thanks for your message. A few things come to mind:

Just adding more QPUs in the Tri and MultiTri examples will simply mean that the same program gets executed more times. So yes, you are increasing the number of QPUs but you are also increasing the amount of work by the same amount. See the README for examples of how to spread work over multiple QPUs.
There is almost no compute in the Tri and MultiTri examples, so memory access will probably be the bottleneck.
These examples are very basic, just computing 64 numbers, so you are only really measuring the overhead of downloading a kernel and waiting for it to complete.

Answer 4 · 2018-04-20T13:48:11.000Z

@mn416 I am still don't quite understand. By using 1 QPU we can process 16 data at once right? If we have 12 QPU then we can process 192 data at once. Isn't the time taken to process 192 data with 12 QPUs and time taken to process 16 data with 1 QPU the same since we have 12 QPU working parallel. It should not increase the time taken in running the kernel.

Answer 5 · 2018-04-20T14:19:04.000Z

Hi @darklord1310,

In an ideal world they would take the same time to compute but:

Although there are many QPUs there is only one memory, which they are all accessing.
For such a small kernel (computing 16 or 192 values), the compute time will be insignificant compared to the overhead of loading the kernel onto the QPUs and synchronising on termination. You didn't mention what kind of time difference you are seeing, so I don't know if this explains it or not.

Hope this helps.

Answer 6 · 2018-04-20T14:45:01.000Z

@mn416 Thanks a lot for the prompt response 😄 I have one more question though, I cloned your code and run it on my RP 2 Model b. Whenever I compile the code with more than 1 QPU, it does not seems to run on RP GPU. If I try to run the program compiled using 1 QPU only without sudo, I will get this error message.

can't open /dev/mem
This program should be run as root. Try prefixing command with: sudo

But if I compiled the code with more than 1 QPU, the program can run without sudo. This is not something that I expected. I compiled the code with this command "make QPU=#Number of QPU Rot3D"

Answer 7 · 2018-04-20T14:51:05.000Z

Hi @darklord1310,

Ah, yes. Compiling with QPU=1 means "target the QPUs", not "use 1 QPU". Any value of QPU other than 1 will mean the code will be compiled to run in emulation mode. To use N QPUs, you need to call k.setNumQPUs(N); before invoking the kernel k. See the README for an example of this.

Hope this helps.