GPU Support
kojix2 opened this issue · 10 comments
Hi @unagiootoro
I ran the XOR sample and found that Cumo was slower than Numo.
If you don't mind my asking, Do you have a GPU + CUDA environment?
If you don't have a GPU, someone in Ruby community (including me) will support you with a donation...
I'm not good at English.
So it takes time to enter long sentences.
May I speak in Japanese with you?
Okay.
My environment is an environment where Cuda can be build.
But since the OS is Windows, I can't build Cumo.
So, I want to be able to learning using ruby-dnn and GPU on Windows.
(I still don't know how to do it)
As a guess, the XOR example has less parallel computation, so Cumo will be slower than Numo.
OK. I see.
Maybe you're right about why XOR is slow.
One more question. How do you install Ruby on Windows?
- RubyInstaller + DevKit + MYS2?
- Windows Subsystem for Linux?
- VirtualBox/VMware + Linux?
- Docker for Windows?
- None of the above
This may be important for building the Cumo.
I'm using WSL, but WSL doesn't support GPU, so I tried install Cumo with RubyInstaller.
But install was not successful.
I think windows nvcc doesn't support gcc, so it's hard to run Cumo on RubyInstaller.
I compared the time using the mnist sample.
mnist_example_for_profiler.rb
require "dnn"
include DNN::Layers
include DNN::Activations
include DNN::Optimizers
include DNN::Losses
include DNN::Models
x_train = SFloat.cast Marshal.load(File.binread("x_train.dat"))
x_test = SFloat.cast Marshal.load(File.binread("x_test.dat"))
y_train = SFloat.cast Marshal.load(File.binread("y_train.dat"))
y_test = SFloat.cast Marshal.load(File.binread("y_test.dat"))
model = Sequential.new
model << InputLayer.new(784)
model << Dense.new(256)
model << ReLU.new
model << Dense.new(256)
model << ReLU.new
model << Dense.new(10)
model.setup(RMSProp.new, SoftmaxCrossEntropy.new)
model.train(x_train, y_train, 20, batch_size: 100, test: [x_test, y_test], verbose: true)
Time
Numo
# real 1m44.682s
# user 7m4.630s
# sys 5m51.241s
Cumo
# real 1m35.018s
# user 1m17.208s
# sys 0m22.956s
stackprof
main.rb
require 'stackprof'
require 'optparse'
opt = ARGV.getopts("g", "gpu", "out:")
if opt['g'] or opt['gpu']
puts "Use Cumo"
require 'cumo'
# https://github.com/sonots/cumo/issues/143
SFloat = Cumo::SFloat
class SFloat
alias mean_original mean
def mean(*args)
if size == 1
self[0]
else
mean_original(*args)
end
end
end
else
puts "Use Numo"
require "numo/linalg"
SFloat = Numo::SFloat
end
StackProf.run(mode: :cpu, out: opt["out"], raw: true) do
load "./mnist_example_for_profiler.rb"
end
ruby main.rb --out profile/numo-mnist.dump
ruby main.rb -g --out profile/cumo-mnist.dump
stackprof profile/numo-mnist.dump
stackprof profile/cumo-mnist.dump
Mode: cpu(1000)
Samples: 25320 (11.51% miss rate)
GC: 1298 (5.13%)
TOTAL (pct) SAMPLES (pct) FRAME
9808 (38.7%) 9808 (38.7%) DNN::Optimizers::RMSProp#update_params
7195 (28.4%) 7075 (27.9%) #<Module:0x0000556476340dc0>.call
2094 (8.3%) 2094 (8.3%) DNN::Activations::ReLU#backward
1298 (5.1%) 1298 (5.1%) (garbage collection)
1010 (4.0%) 1010 (4.0%) DNN::Activations::ReLU#forward
8062 (31.8%) 802 (3.2%) #<Module:0x0000556476340e88>.dot
5717 (22.6%) 768 (3.0%) DNN::Layers::Dense#backward
10461 (41.3%) 646 (2.6%) DNN::Optimizers::Optimizer#update
411 (1.6%) 411 (1.6%) DNN::Models::Model#evaluate
24004 (94.8%) 355 (1.4%) DNN::Models::Model#train
183 (0.7%) 183 (0.7%) DNN::Losses::SoftmaxCrossEntropy.softmax
350 (1.4%) 167 (0.7%) DNN::Losses::SoftmaxCrossEntropy#forward_loss
120 (0.5%) 120 (0.5%) #<Module:0x0000556476340e88>.blas_char
3307 (13.1%) 101 (0.4%) DNN::Layers::Dense#forward
8155 (32.2%) 93 (0.4%) Numo::NArray#dot
65 (0.3%) 65 (0.3%) Numo::NArray.asarray
59 (0.2%) 56 (0.2%) DNN::Models::Model#layers
52 (0.2%) 33 (0.1%) DNN::Iterator#next_batch
29 (0.1%) 29 (0.1%) DNN::Layers::Layer#built?
26 (0.1%) 26 (0.1%) DNN::Link#initialize
381 (1.5%) 25 (0.1%) DNN::Losses::Loss#forward
23 (0.1%) 23 (0.1%) DNN::Losses::SoftmaxCrossEntropy#backward_loss
22 (0.1%) 22 (0.1%) DNN::Iterator#reset_indexs
22604 (89.3%) 18 (0.1%) DNN::Models::Model#train_on_batch
15 (0.1%) 15 (0.1%) DNN::Layers::InputLayer#forward
7827 (30.9%) 13 (0.1%) DNN::Models::Model#backward
10 (0.0%) 10 (0.0%) DNN::Layers::Connection#regularizers
24022 (94.9%) 9 (0.0%) <top (required)>
34 (0.1%) 7 (0.0%) DNN::Losses::Loss#backward
7 (0.0%) 7 (0.0%) DNN::Layers::Connection#get_params
Mode: cpu(1000)
Samples: 22564 (9.41% miss rate)
GC: 164 (0.73%)
TOTAL (pct) SAMPLES (pct) FRAME
8007 (35.5%) 8007 (35.5%) DNN::Models::Model#evaluate
7692 (34.1%) 7673 (34.0%) Cumo::NArray#dot
3259 (14.4%) 3259 (14.4%) DNN::Activations::ReLU#backward
1163 (5.2%) 1163 (5.2%) DNN::Optimizers::RMSProp#update_params
1017 (4.5%) 1017 (4.5%) Cumo::NArray#to_f
352 (1.6%) 340 (1.5%) DNN::Iterator#next_batch
280 (1.2%) 177 (0.8%) DNN::Losses::SoftmaxCrossEntropy#forward_loss
164 (0.7%) 164 (0.7%) (garbage collection)
1281 (5.7%) 115 (0.5%) DNN::Optimizers::Optimizer#update
4232 (18.8%) 105 (0.5%) DNN::Layers::Dense#backward
103 (0.5%) 103 (0.5%) DNN::Losses::SoftmaxCrossEntropy.softmax
22385 (99.2%) 85 (0.4%) DNN::Models::Model#train
3631 (16.1%) 66 (0.3%) DNN::Layers::Dense#forward
62 (0.3%) 62 (0.3%) DNN::Activations::ReLU#forward
32 (0.1%) 32 (0.1%) DNN::Losses::SoftmaxCrossEntropy#backward_loss
23 (0.1%) 22 (0.1%) DNN::Models::Model#layers
19 (0.1%) 19 (0.1%) Cumo::NArray.asarray
301 (1.3%) 17 (0.1%) DNN::Losses::Loss#forward
16 (0.1%) 16 (0.1%) DNN::Layers::Layer#built?
16 (0.1%) 16 (0.1%) DNN::Link#initialize
15 (0.1%) 15 (0.1%) DNN::Iterator#reset_indexs
13 (0.1%) 13 (0.1%) Cumo::SFloat#mean
22400 (99.3%) 11 (0.0%) <top (required)>
11 (0.0%) 11 (0.0%) DNN::Layers::InputLayer#forward
7502 (33.2%) 10 (0.0%) DNN::Models::Model#backward
12288 (54.5%) 9 (0.0%) DNN::Models::Model#train_on_batch
9 (0.0%) 9 (0.0%) DNN::Layers::Connection#regularizers
40 (0.2%) 3 (0.0%) DNN::Losses::Loss#backward
23 (0.1%) 3 (0.0%) DNN::Layers::InputLayer#call
8667 (38.4%) 3 (0.0%) DNN::Models::Model#accurate
Maybe the Cumo will be a little faster if you tune it..
Thank you for taking a benchmarking Numo and Cumo.
When using Cumo, it is necessary to reduce the transfer of CPU and GPU, so modify 'evaluate' method as follows. (Only multi-class classification is modified)
private def evaluate(y, t)
if y.shape[1..-1] == [1]
correct = 0
y.shape[0].times do |i|
if @loss_func.is_a?(Losses::SigmoidCrossEntropy)
correct += 1 if (y[i, 0] < 0 && t[i, 0] < 0.5) || (y[i, 0] >= 0 && t[i, 0] >= 0.5)
else
correct += 1 if (y[i, 0] < 0 && t[i, 0] < 0) || (y[i, 0] >= 0 && t[i, 0] >= 0)
end
end
else
correct = y.max_index(axis: 1).eq(t.max_index(axis: 1)).count
end
correct
end
I think this may make Cumo faster.
ruby-dnn & Cumo got faster with version 0.13.0 !
Same benchmark code as above
Numo
real 1m47.222s
user 7m29.562s <- Numo::Linulg!! blazing performance
sys 6m4.522s
TOTAL (pct) SAMPLES (pct) FRAME
12618 (51.0%) 12618 (51.0%) DNN::Optimizers::RMSProp#update_params
4510 (18.2%) 4414 (17.8%) #<Module:0x000056193c6c03a8>.call
1997 (8.1%) 1997 (8.1%) DNN::Activations::ReLU#backward
1279 (5.2%) 1279 (5.2%) (garbage collection)
895 (3.6%) 895 (3.6%) DNN::Activations::ReLU#forward
5410 (21.9%) 845 (3.4%) #<Module:0x000056193c6c0718>.dot
4490 (18.1%) 710 (2.9%) DNN::Layers::Dense#backward
13250 (53.6%) 625 (2.5%) DNN::Optimizers::Optimizer#update
23446 (94.8%) 316 (1.3%) DNN::Models::Model#train
260 (1.1%) 260 (1.1%) DNN::Losses::SoftmaxCrossEntropy.softmax
467 (1.9%) 207 (0.8%) DNN::Losses::SoftmaxCrossEntropy#forward
1800 (7.3%) 114 (0.5%) DNN::Layers::Dense#forward
96 (0.4%) 96 (0.4%) #<Module:0x000056193c6c0718>.blas_char
5466 (22.1%) 56 (0.2%) Numo::NArray#dot
55 (0.2%) 55 (0.2%) Numo::NArray.asarray
24 (0.1%) 24 (0.1%) DNN::Losses::SoftmaxCrossEntropy#backward
25 (0.1%) 22 (0.1%) DNN::Iterator#next_batch
21 (0.1%) 21 (0.1%) DNN::Iterator#reset
21 (0.1%) 21 (0.1%) DNN::Layers::Layer#built?
21 (0.1%) 21 (0.1%) DNN::Link#initialize
18 (0.1%) 17 (0.1%) DNN::Models::Model#layers
19 (0.1%) 17 (0.1%) DNN::Losses::Loss#regularizers_backward
15 (0.1%) 15 (0.1%) DNN::Layers::InputLayer#forward
22880 (92.5%) 13 (0.1%) DNN::Models::Model#train_on_batch
15 (0.1%) 10 (0.0%) DNN::Losses::Loss#regularizers_forward
23464 (94.8%) 9 (0.0%) <top (required)>
8 (0.0%) 8 (0.0%) #<Module:0x000056193c890480>.learning_phase=
7 (0.0%) 7 (0.0%) DNN::Models::Model#evaluate
7 (0.0%) 7 (0.0%) DNN::Layers::Connection#get_params
7 (0.0%) 7 (0.0%) DNN::Layers::Connection#regularizers
Cumo
real 1m6.295s <- 1m35.018s
user 0m58.364s
sys 0m12.208s
TOTAL (pct) SAMPLES (pct) FRAME
5954 (50.7%) 5940 (50.6%) Cumo::NArray#dot
3250 (27.7%) 3250 (27.7%) DNN::Activations::ReLU#backward
898 (7.6%) 898 (7.6%) DNN::Optimizers::RMSProp#update_params
526 (4.5%) 526 (4.5%) Cumo::NArray#to_f
200 (1.7%) 200 (1.7%) DNN::Iterator#next_batch
264 (2.2%) 163 (1.4%) DNN::Losses::SoftmaxCrossEntropy#forward
3858 (32.9%) 131 (1.1%) DNN::Layers::Dense#backward
101 (0.9%) 101 (0.9%) DNN::Losses::SoftmaxCrossEntropy.softmax
2321 (19.8%) 94 (0.8%) DNN::Layers::Dense#forward
71 (0.6%) 71 (0.6%) DNN::Activations::ReLU#forward
69 (0.6%) 69 (0.6%) (garbage collection)
966 (8.2%) 67 (0.6%) DNN::Optimizers::Optimizer#update
11660 (99.3%) 41 (0.3%) DNN::Models::Model#train
309 (2.6%) 32 (0.3%) DNN::Losses::Loss#loss
29 (0.2%) 29 (0.2%) DNN::Losses::SoftmaxCrossEntropy#backward
21 (0.2%) 21 (0.2%) DNN::Models::Model#evaluate
14 (0.1%) 14 (0.1%) Cumo::NArray.asarray
11 (0.1%) 11 (0.1%) DNN::Iterator#reset
11 (0.1%) 11 (0.1%) DNN::Link#initialize
11674 (99.4%) 10 (0.1%) <top (required)>
13 (0.1%) 9 (0.1%) DNN::Losses::Loss#regularizers_forward
10 (0.1%) 9 (0.1%) DNN::Models::Model#layers
9 (0.1%) 9 (0.1%) DNN::Layers::Layer#built?
7 (0.1%) 7 (0.1%) Cumo::SFloat#mean
2413 (20.5%) 4 (0.0%) DNN::Layers::Layer#call
10832 (92.2%) 4 (0.0%) DNN::Models::Model#train_on_batch
4 (0.0%) 4 (0.0%) DNN::Layers::InputLayer#forward
4 (0.0%) 4 (0.0%) DNN::Layers::Connection#regularizers
2 (0.0%) 2 (0.0%) DNN::Losses::Loss#regularizers_backward
3 (0.0%) 1 (0.0%) <top (required)>
With Google Colab, you can benchmark ruby-dnn in your browser.
Please help yourself !
https://colab.research.google.com/drive/1RJ8HTNI6akqBYZgZWzFve9c6GTz_Tava
- Google colab で RubyからGPU を使って行列計算する [Japanese]