the same code and configuration, nvidia-tensorflow gpu card OOM when reuse=True on A30. but tensorflow 1.14 work OK on T4.
BingWin789 opened this issue · 0 comments
Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template
System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 20.04.2 LTS
- Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: No
- TensorFlow installed from (source or binary): pip
- TensorFlow version (use command below): 1.15.5+nv22.8
- Python version: 3.8.3
- Bazel version (if compiling from source):
- GCC/Compiler version (if compiling from source):
- CUDA/cuDNN version: V11.1.105/
- GPU model and memory: A30
You can collect some of this information using our environment capture
script
You can also obtain the TensorFlow version with: 1. TF 1.0: python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"
2. TF 2.0: python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"
Describe the current behavior
I used two datasets to train my model alternatively, the model share common weights. Like this:
with tf.variable_scope('mymodel', reuse=False):
pred1 = model(dataset1)
with tf.variable_scope('mymodel', reuse=True):
pred2 = model(dataset2)
Describe the expected behavior
I used batchsize of 12 to train my model. Tensorflow works OK on T4, but Nvidia-Tensorflow gpu card OOM on A30.
Code to reproduce the issue
Provide a reproducible test case that is the bare minimum necessary to generate the problem.
Other info / logs
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.