Execution ends early with no error output for my 1050 Ti 4 gb
agznawi opened this issue · 1 comments
agznawi commented
I am new to TensorFlow so it is likely an easy problem.
The code works on my CPU0 (took around 3 days)
However, it terminates early on GPU with no error output.
In [1]: runfile('/media/.../face-aging-caae/Face-Aging-CAAE-master/main.py', wdir='/media/.../face-aging-caae/Face-Aging-CAAE-master')
Namespace(dataset='UTKFace', epoch=50, is_train=True, savedir='save', testdir='None', use_init_model=True, use_trained_model=True)
Building graph ...
WARNING:tensorflow:From /home/abdullah/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/datasets/base.py:198: retry (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Use the retry module or similar alternatives.
Training Mode
Loading pre-trained model ...
FAILED >_<!
Loading init model ...
INFO:tensorflow:Restoring parameters from init_model/model-init
In [1]:
I have tested other codes on GPU and it works.
The code seems to exit during execution of this block on "FaceAging.py":
# update
_, _, _, EG_err, Ez_err, Dz_err, Dzp_err, Gi_err, DiG_err, Di_err, TV = self.session.run(
fetches = [
self.EG_optimizer,
self.D_z_optimizer,
self.D_img_optimizer,
self.EG_loss,
self.E_z_loss,
self.D_z_loss_z,
self.D_z_loss_prior,
self.G_img_loss,
self.D_img_loss_G,
self.D_img_loss_input,
self.tv_loss
],
feed_dict={
self.input_image: batch_images,
self.age: batch_label_age,
self.gender: batch_label_gender,
self.z_prior: batch_z_prior
}
)
My GPU is GTX 1050 Ti 4 gb.
How do I fix this problem?
Thank you.
agznawi commented
Using VSCode, this message showed up:
Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms) Aborted (core dumped)
I checked for compatibility and found out I have cudnn 7.3 while this version of tf needs cudnn 7 accourding to this table:
https://www.tensorflow.org/install/source#tested_build_configurations
I downgraded cudnn to 7.0.5, and the code runs without issues (took 7h and 7m).