google-deepmind/graph_nets

Kernel Restart - Incompatibility between nx.draw and utils_tf.data_dicts_to_graphs_tuple

mshearer0 opened this issue · 7 comments

Hi.

I'm trying to use nx.draw and utils_tf.data_dicts_to_graphs_tuple in the same TF2 notebook.

Whichever is executed second seems to cause a kernel restart in the notebook which i can't explain. Importing networkx is fine as long as nx.draw is not run.

@Mistobaan - I get this behaviour on your very helpful TF2 version of graph_nets_basic tutorial.

Michael.

I have not observed this, not sure if @Mistobaan did.

Are you running on your own kernel, or on Google Colaboratory?

In my experience that is usually an out of memory case. Check the system logs if you are running on Colab.

Hi, thanks. I'm running on GCP Notebook with 15GB RAM. GCP logs show:

Aug 12 21:12:02 ... bash[1278]: OMP: Error #15: Initializing libiomp5.so, but found libomp.so already initialized.
Aug 12 21:12:02 ... bash[1278]: OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the progr
am. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is
linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can
set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorr
ect results. For more information, please see http://www.intel.com/software/products/support/.
Aug 12 21:12:03 ... bash[1278]: [I 21:12:03.530 LabApp] KernelRestarter: restarting kernel (1/5), keep random ports
Aug 12 21:12:03 ... bash[1278]: kernel ... restarted

I think the answer is printed by your logs:

set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results

@Mistobaan - yes, I’ve used that as a workaround but wondered if there was a better option?

Get a bigger machine with more memory? Can you replicate the problem into a colab and post the link to the colab? make sure you set the share permissions.

Upgrading to GCP Notebook Tensorflow 2.3 (from 2.2.0) resolved the issue.