TPU Estimator Crashing
Opened this issue · 2 comments
Tensorflow version: tensorflow==2.0.0b0
Tensorflow Datasets Version: tfds-nightly==1.0.2.dev201906090105
Tensorflow Hub Version: tf-hub-nightly==0.5.0.dev201905270046
Issue
Code Raises
End of sequence [[node input_pipeline_task0/while/IteratorGetNext (defined at image_retraining_tpu.py:139) ]]
for All values of max_steps
in TPUEstimator.train(...)
Reproduce the issue
$ python3 image_retraining_tpu.py --tpu [TPU_NAME] \
--use_tpu --use_compat --data_dir gs://[BUCKET_NAME]/data_dir \
--model_dir gs://[BUCKET_NAME]/model_dir --batch_size=32 \
--iterations=8 --max_steps=8
The Same error rises for
--use_tpu --use_compat --data_dir gs://[BUCKET_NAME]/data_dir \
--model_dir gs://[BUCKET_NAME]/model_dir --batch_size=32 \
--iterations=8 --max_steps=4
$ python3 image_retraining_tpu.py --tpu [TPU_NAME] \
--use_tpu --use_compat --data_dir gs://[BUCKET_NAME]/data_dir \
--model_dir gs://[BUCKET_NAME]/model_dir --batch_size=32 \
--iterations=8 --max_steps=100
$ python3 image_retraining_tpu.py --tpu [TPU_NAME] \
--use_tpu --use_compat --data_dir gs://[BUCKET_NAME]/data_dir \
--model_dir gs://[BUCKET_NAME]/model_dir --batch_size=32 \
--iterations=8 --max_steps=500
$ python3 image_retraining_tpu.py --tpu [TPU_NAME] \
--use_tpu --use_compat --data_dir gs://[BUCKET_NAME]/data_dir \
--model_dir gs://[BUCKET_NAME]/model_dir --batch_size=32 \
--iterations=8 --max_steps=1000
Line 139
GSOC/E1_TPU_Sample/image_retraining_tpu.py
Lines 135 to 139 in 513a0ec
Log file
Error starts from Line 230 of output.log
output.log
This looks likes a bug with the TPUEstimator
. As far as I understand this part of the docs, the Estimator API handles the OutofRange
error from the input data function by stopping iterations (and not raising an exception). TPUEstimator
doesn't seem to behave that way yet.
Can you open an issue on TF to cross-check?
Also, does the script work with the try...except
block?
Nope it doesn't. Actually, weirdly enough the code doesn't stop running. It keeps on saying that TPU is Healthy and tries to refresh the token and Doesn't break out, even if there's no more code to execute.