Check failed: dnnReLUCreateBackward_F32

Question

Check failed: dnnReLUCreateBackward_F32

miyoungvkim opened this issue 5 years ago · 1 comments

miyoungvkim commented 5 years ago

Hello :D

I'm trying to use gst-tacotron with blizzardchallenge2013 datasets.

When I try to training, I met check failed error. (It same to use gst true option..)

so, I ask about below...

I just try to training so, I didn't change base code.
May I get some idea for solve this problem??

Here is my log

1. when I use gst_false option ..
gst-tacotron_gst_false# python train.py
/root/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:493: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/root/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:494: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/root/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:495: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/root/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:496: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/root/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:497: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/root/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:502: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
Checkpoint path: /data/workspace/blizzardchllenge/gst-tacotron_gst_false/logs-tacotron/model.ckpt
Loading training data from: /data/workspace/blizzardchllenge/gst-tacotron_gst_false/training/train.txt
Using model: tacotron
Hyperparameters:
adam_beta1: 0.9
adam_beta2: 0.999
attention_depth: 256
batch_size: 32
cleaners: english_cleaners
decay_learning_rate: True
embed_depth: 256
encoder_depth: 256
frame_length_ms: 50
frame_shift_ms: 12.5
griffin_lim_iters: 60
initial_learning_rate: 0.002
max_iters: 1000
min_level_db: -100
num_freq: 1025
num_gst: 10
num_heads: 4
num_mels: 80
outputs_per_step: 2
power: 1.5
preemphasis: 0.97
prenet_depths: [256, 128]
ref_level_db: 20
reference_depth: 128
reference_filters: [32, 32, 64, 64, 128, 128]
rnn_depth: 256
sample_rate: 16000
style_att_dim: 128
style_att_type: mlp_attention
style_embed_depth: 256
use_cmudict: False
use_gst: False
Loaded metadata for 9725 examples (20.13 hours)
Initialized Tacotron model. Dimensions:
text embedding: 256
style embedding: 128
prenet out: 128
encoder out: 384
attention out: 256
concat attn & out: 640
decoder cell out: 256
decoder out (2 frames): 160
decoder out (1 frame): 80
postnet out: 256
linear out: 1025
Starting new training run at commit: None
Generated 32 batches of size 32 in 90.126 sec
Step 1 [139.284 sec/step, loss=0.87672, avg_loss=0.87672]
Step 2 [130.008 sec/step, loss=0.97632, avg_loss=0.92652]
Step 3 [141.618 sec/step, loss=0.98165, avg_loss=0.94490]
Step 4 [194.484 sec/step, loss=0.99856, avg_loss=0.95831]
Step 5 [177.694 sec/step, loss=0.95613, avg_loss=0.95788]
2019-12-03 09:52:09.674825: F tensorflow/core/kernels/mkl_relu_op.cc:328] Check failed: dnnReLUCreateBackward_F32(&mkl_context.prim_relu_bwd, __null, mkl_context.lt_grad, mkl_context.lt_grad, negative_slope) == E_SUCCESS (-1 vs. 0)
Aborted (core dumped)

2. when I use gst_true option ..

gst-tacotron_gst_true/training/train.txt
Using model: tacotron
Hyperparameters:
adam_beta1: 0.9
adam_beta2: 0.999
attention_depth: 256
batch_size: 32
cleaners: english_cleaners
decay_learning_rate: True
embed_depth: 256
encoder_depth: 256
frame_length_ms: 50
frame_shift_ms: 12.5
griffin_lim_iters: 60
initial_learning_rate: 0.002
max_iters: 1000
min_level_db: -100
num_freq: 1025
num_gst: 10
num_heads: 4
num_mels: 80
outputs_per_step: 2
power: 1.5
preemphasis: 0.97
prenet_depths: [256, 128]
ref_level_db: 20
reference_depth: 128
reference_filters: [32, 32, 64, 64, 128, 128]
rnn_depth: 256
sample_rate: 16000
style_att_dim: 128
style_att_type: mlp_attention
style_embed_depth: 256
use_cmudict: False
use_gst: True
Loaded metadata for 9725 examples (20.13 hours)
WARNING:tensorflow:From /data/workspace/blizzardchllenge/gst-tacotron_gst_true/models/multihead_attention.py:114: calling reduce_sum (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
Initialized Tacotron model. Dimensions:
text embedding: 256
style embedding: 256
prenet out: 128
encoder out: 512
attention out: 256
concat attn & out: 768
decoder cell out: 256
decoder out (2 frames): 160
decoder out (1 frame): 80
postnet out: 256
linear out: 1025
Starting new training run at commit: None
Generated 32 batches of size 32 in 2.063 sec
Step 1 [190.721 sec/step, loss=0.87613, avg_loss=0.87613]
Step 2 [105.134 sec/step, loss=0.78472, avg_loss=0.83042]
Step 3 [87.687 sec/step, loss=0.86729, avg_loss=0.84271]
Step 4 [81.866 sec/step, loss=0.88327, avg_loss=0.85285]
Step 5 [73.656 sec/step, loss=0.85281, avg_loss=0.85284]
Step 6 [76.789 sec/step, loss=0.87447, avg_loss=0.85645]

2019-12-03 10:45:01.889230: F tensorflow/core/kernels/mkl_relu_op.cc:328] Check failed: dnnReLUCreateBackward_F32(&mkl_context.prim_relu_bwd, __null, mkl_context.lt_grad, mkl_context.lt_grad, negative_slope) == E_SUCCESS (-1 vs. 0)
Aborted (core dumped)

thank you :D

Answer 1 · 2019-12-03T02:24:02.000Z

Hi, I guess it is all because of your environment. It's from the MKL. Since this code is from TensorFlow 1.6, which is a little out-of-data and very different from current version of TensorFlow.