proteneer/khan

MultiGPU: Check failed: IsAligned()

Opened this issue · 0 comments

When trying to train with 4 GPU's (Volta or Pascal) the training fails after 1-4 epochs with the error message below.

I am running:
python -u $KHAN_DIR/gdb8.py --train-dir $ANI_DATA_DIR --save-dir . --gpus $nGPU --ani-lib $KHAN_DIR/gpu_featurizer/ani.so

The only code change I made was to enable all training files (data_loaders.py:41++).

I am using:
CUDA/9.0.176
cuDNN/7.1.4
TensorFlow/1.8.0

Error:

2018-10-05 15:49:36.716642: F /gstore/apps/TensorFlow/1.8.0-foss-2017a-CUDA-9.0.176-Python-3.6.3/lib/python3.6/site-packages/tensorflow/include/tensorflow/core/framework/tensor.h:694] Check failed: IsAligned()
2018-10-05 15:49:36.716488: E tensorflow/core/common_runtime/bfc_allocator.cc:381] tried to deallocate nullptr
2018-10-05 15:49:36.716482: E tensorflow/core/common_runtime/bfc_allocator.cc:381] tried to deallocate nullptr