Execution ends early with no error output for my 1050 Ti 4 gb

Question

Execution ends early with no error output for my 1050 Ti 4 gb

agznawi opened this issue 5 years ago · 1 comments

I am new to TensorFlow so it is likely an easy problem.
The code works on my CPU0 (took around 3 days)
However, it terminates early on GPU with no error output.

In [1]: runfile('/media/.../face-aging-caae/Face-Aging-CAAE-master/main.py', wdir='/media/.../face-aging-caae/Face-Aging-CAAE-master')
Namespace(dataset='UTKFace', epoch=50, is_train=True, savedir='save', testdir='None', use_init_model=True, use_trained_model=True)

        Building graph ...
WARNING:tensorflow:From /home/abdullah/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/datasets/base.py:198: retry (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Use the retry module or similar alternatives.

        Training Mode

        Loading pre-trained model ...
        FAILED >_<!

        Loading init model ...
INFO:tensorflow:Restoring parameters from init_model/model-init

In [1]:

I have tested other codes on GPU and it works.

The code seems to exit during execution of this block on "FaceAging.py":

                # update
                _, _, _, EG_err, Ez_err, Dz_err, Dzp_err, Gi_err, DiG_err, Di_err, TV = self.session.run(
                    fetches = [
                        self.EG_optimizer,
                        self.D_z_optimizer,
                        self.D_img_optimizer,
                        self.EG_loss,
                        self.E_z_loss,
                        self.D_z_loss_z,
                        self.D_z_loss_prior,
                        self.G_img_loss,
                        self.D_img_loss_G,
                        self.D_img_loss_input,
                        self.tv_loss
                    ],
                    feed_dict={
                        self.input_image: batch_images,
                        self.age: batch_label_age,
                        self.gender: batch_label_gender,
                        self.z_prior: batch_z_prior
                    }
                )

My GPU is GTX 1050 Ti 4 gb.
How do I fix this problem?
Thank you.

Answer 1 · 2019-05-16T16:50:52.000Z

Using VSCode, this message showed up:

Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms) Aborted (core dumped)

I checked for compatibility and found out I have cudnn 7.3 while this version of tf needs cudnn 7 accourding to this table:
https://www.tensorflow.org/install/source#tested_build_configurations
I downgraded cudnn to 7.0.5, and the code runs without issues (took 7h and 7m).