GoodAI/BrainSimulator

How to run Brain Simulator on an 'older' pc

Soucha opened this issue · 0 comments

I received the following error after I installed Brain Simulator using the installer and ran an example.
Observer update failed: ErrorOperatingSystem: This indicates that an OS call failed.

Therefore, I cloned the repo (and ran git submodule update --init) to obtain sources. My notes to put Brain Simulator into operation on my PC and a bug that I found follow.

My PC: Win 10 64-bit Pro with VS Pro 2013 Update 5,
NVIDIA GeForce 610M, Compute Capability 2.1, Driver Model WDDM,
CUDA Toolkit 7.5

After I opened the main project file BrainSimulator\Sources\BrainSimulator.sln, BasicNodesCuda group was not able to load in VS due to different CUDA Toolkit (7.5 vs. developed on 7.0).

Therefore,

  1. Edit BrainSimulator\Sources\Modules\BasicNodes\Cuda\BasicNodesCuda.vcxproj (right click on the unloaded group in VS):
    (line 163)
    (line 299)
  2. Reload and build the project. It installs the libraries.
  3. Install ManagedCuda 7.5 - x64 via Manage NuGetPackages
    Now, Brain Simulator should run.

However, I needed to solve another issue. It seems that CUDA/OpenGL interoperability does not work in my setting. It is needed for a visualization of data, e.g. MatrixObserver uses CUDA kernel to 'draw' a matrix into a buffer that is then rendered by OpenGL.
It is interesting that interoperability runs well in CUDA C/C++ Samples that uses glut. Nevertheless, ManagedCuda fails on system call of nvcuda.dll that tries to register a GL buffer in a CUDA context.
My fix is simple (but a little inefficient). A new CUDA buffer is created instead of registering of GL buffer. The CUDA buffer is then copied in the GL buffer after kernel finishes.

Even though I was not able to run some samples due to the error Out Of Resource by launching a kernel, e.g. DrawMatrixKernel. According to CUDA_Occupancy_calculator.xls, The CUDA compiler attempts to minimize register usage to maximize the number of thread blocks that can be active in the machine simultaneously. If a program tries to launch a kernel for which the registers used per thread times the thread block size is greater than N, the launch will fail.
The size of N on GPUs with compute capability 1.0-1.1 is 8192 32-bit registers per multiprocessor. On GPUs with compute capability 1.2-1.3, N = 16384. On GPUs with compute capability 2.0-2.1, N = 32768. On GPUs with compute capability 3.0, N=65536.

It means that maximum threads per block is decreased to 704 due to the fact that DrawMatrixKernel uses 44 registers on my gpu. However, Brain Simulator tries to run each kernel with constant MaxThreadsPerBlock of the current GPU. The bug is in MyKernelFactory.cs after loading a kernel (line 47).
The fix: MAX_THREADS = m_kernel.MaxThreadsPerBlock;

The same issue is in MyLSTMFeedForwardTask + kernel GetNetInput in LSTMFeedForwardKernel.cu that contain maxThreadsCount=1024 implicitly.

Hope that this helps to someone.