imai-laboratory/nec

resource exhausted error

Opened this issue · 12 comments

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[32,24487,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: deepq/DND/lookup/Tile = Tile[T=DT_FLOAT, Tmultiples=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](deepq/DND/lookup/Tile/input, deepq/DND/lookup/Tile/multiples)]]

2018-02-21 01:03:53.047199: W tensorflow/core/common_runtime/bfc_allocator.cc:273] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.49GiB. Current allocation summary follows.
2018-02-21 01:03:53.047267: I tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (256): Total Chunks: 64, Chunks in use: 63. 16.0KiB allocated for chunks. 15.8KiB in use in bin. 3.5KiB client-requested in use in bin.
2018-02-21 01:03:53.047305: I tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (512): Total Chunks: 1, Chunks in use: 1. 512B allocated for chunks. 512B in use in bin. 384B client-requested in use in bin.
2018-02-21 01:03:53.047327: I tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (1024): Total Chunks: 1, Chunks in use: 1. 1.2KiB allocated for chunks. 1.2KiB in use in bin. 1.0KiB client-requested in use in bin.
2018-02-21 01:03:53.047347: I tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (2048): Total Chunks: 5, Chunks in use: 5. 10.0KiB allocated for chunks. 10.0KiB in use in bin. 10.0KiB client-requested in use in bin.
2018-02-21 01:03:53.047366: I tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (4096): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2018-02-21 01:03:53.047385: I tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (8192): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2018-02-21 01:03:53.047403: I tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (16384): Total Chunks: 1, Chunks in use: 0. 22.2KiB allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2018-02-21 01:03:53.047425: I tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (32768): Total Chunks: 5, Chunks in use: 5. 160.0KiB allocated for chunks. 160.0KiB in use in bin. 160.0KiB client-requested in use in bin.
2018-02-21 01:03:53.047447: I tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (65536): Total Chunks: 3, Chunks in use: 2. 288.0KiB allocated for chunks. 192.0KiB in use in bin. 191.7KiB client-requested in use in bin.
2018-02-21 01:03:53.047471: I tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (131072): Total Chunks: 11, Chunks in use: 11. 1.47MiB allocated for chunks. 1.47MiB in use in bin. 1.42MiB client-requested in use in bin.
2018-02-21 01:03:53.047492: I tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (262144): Total Chunks: 7, Chunks in use: 7. 2.67MiB allocated for chunks. 2.67MiB in use in bin. 2.67MiB client-requested in use in bin.
2018-02-21 01:03:53.047511: I tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (524288): Total Chunks: 2, Chunks in use: 2. 1.89MiB allocated for chunks. 1.89MiB in use in bin. 1.89MiB client-requested in use in bin.
2018-02-21 01:03:53.047535: I tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (1048576): Total Chunks: 1, Chunks in use: 1. 1.72MiB allocated for chunks. 1.72MiB in use in bin. 1.72MiB client-requested in use in bin.
2018-02-21 01:03:53.047556: I tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (2097152): Total Chunks: 2, Chunks in use: 1. 5.89MiB allocated for chunks. 3.45MiB in use in bin. 3.45MiB client-requested in use in bin.
2018-02-21 01:03:53.047576: I tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (4194304): Total Chunks: 1, Chunks in use: 1. 5.17MiB allocated for chunks. 5.17MiB in use in bin. 3.12MiB client-requested in use in bin.
2018-02-21 01:03:53.047601: I tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (8388608): Total Chunks: 5, Chunks in use: 5. 75.62MiB allocated for chunks. 75.62MiB in use in bin. 75.62MiB client-requested in use in bin.
2018-02-21 01:03:53.047620: I tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (16777216): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2018-02-21 01:03:53.047638: I tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (33554432): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2018-02-21 01:03:53.047657: I tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (67108864): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2018-02-21 01:03:53.047679: I tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (134217728): Total Chunks: 4, Chunks in use: 4. 781.25MiB allocated for chunks. 781.25MiB in use in bin. 781.25MiB client-requested in use in bin.
2018-02-21 01:03:53.047700: I tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (268435456): Total Chunks: 6, Chunks in use: 5. 8.98GiB allocated for chunks. 7.48GiB in use in bin. 7.48GiB client-requested in use in bin.
2018-02-21 01:03:53.047721: I tensorflow/core/common_runtime/bfc_allocator.cc:644] Bin for 1.49GiB was 256.00MiB, Chunk State:
2018-02-21 01:03:53.047746: I tensorflow/core/common_runtime/bfc_allocator.cc:650] Size: 1.49GiB | Requested Size: 6.2KiB | in_use: 0, prev: Size: 1.51GiB | Requested Size: 1.51GiB | in_use: 1

try running with config.gpu_options.allow_growth = True

nec/dnd.py

Line 44 in 4f2e3d9

tiled_keys = tf.tile([keys], [tf.shape(h)[0], 1, 1])

hで入ってくるのって (keysize, batchsize)だよね.batchってlast axisに沿って拡張されるよね.
するとh.shape[0]でtileするのっておかしい気がするんだけど

メモリサイズ考えると
keysize * float32 * batchsize * dndsize * capacity
512 * 32 * 4 * 4 * 5 * 10 ** 5 = 16GB
だから原理的に指数の部分が効いてくる.上手く並列化するのが手っ取り早いのか?

while_loopはloopと言いつつ自動的に並列化してくれるはず

なるほどね.while_loopは何に対してiterationを掛けるべきか.
keyのindex? or batch?

あと普通にbroadcastingがexplicitにtileしてやるよりメモリを節約できるらしいが,
なぜ我々は後者の書き方をしているのか.
tensorflow/tensorflow#1934
こちらも手っ取り早く試してみる.

got Dst tensor is not initialized.
seems this raises when GPU memory is full.
aymericdamien/TensorFlow-Examples#38

これbroadcastingできるの?

perhaps.今夜試してみる.
tf.deviceで別々のGPUにDND割り当てたら解決しないかな.

I hope transferring data between GPUs doesn't take long time.

Converting sparse IndexSlices to a dense Tensor unknown shape. This may consume a large amount of memory.

References

https://stackoverflow.com/questions/35892412/tensorflow-dense-gradient-explanation

Now running with half tile half broadcasted.

  • check if it works

I'm unable to run the model on any environment but CartPole due to ResourceExhausted errors. Any tips?

@jlindsey15 Thank you for the comment! If you have multiple gpus, splitting DNDs into each device will solve the issue.