Multi-channel audio doesn't work with --data_num_channels 2 in Jupyter Lab
Andy671 opened this issue · 1 comments
Hello, @chrisdonahue thanks for the great paper and code sharing of WaveGAN! You slay! I've managed to run it on Google Colab, without any problems. BUT...
The problem is I can't make it work on paid Google AI Platform Notebooks in Jupyter Lab. I spent a few days and found out that the problem is --data_num_channels 2
. I've tried different setups including CUDA 10, CUDA 11, tensorflow-gpu==1.15.2, tensorflow-gpu==1.14.0, and a few more, but in any case, 2 channel audio just doesn't work and gives me this log (The last line seems to be very promising, as it's how I figured the problem was in 2 channel audio):
Traceback (most recent call last):
File "train_wavegan.py", line 654, in <module>
train(fps, args)
File "train_wavegan.py", line 93, in train
D_G_z = WaveGANDiscriminator(G_z, **args.wavegan_d_kwargs)
File "/home/jupyter/DavidBlaine-Project/wavegan/wavegan.py", line 194, in WaveGANDiscriminator
output = tf.layers.conv1d(output, dim, kernel_len, 4, padding='SAME')
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 324, in new_func
return func(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/layers/convolutional.py", line 218, in conv1d
return layer.apply(inputs)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1479, in apply
return self.__call__(inputs, *args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/layers/base.py", line 537, in __call__
outputs = super(Layer, self).__call__(inputs, *args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 591, in __call__
self._maybe_build(inputs)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1881, in _maybe_build
self.build(input_shapes)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/layers/convolutional.py", line 165, in build
dtype=self.dtype)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/layers/base.py", line 450, in add_weight
**kwargs)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 384, in add_weight
aggregation=aggregation)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/training/tracking/base.py", line 663, in _add_variable_with_custom_getter
**kwargs_for_getter)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/variable_scope.py", line 1496, in get_variable
aggregation=aggregation)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/variable_scope.py", line 1239, in get_variable
aggregation=aggregation)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/variable_scope.py", line 562, in get_variable
aggregation=aggregation)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/variable_scope.py", line 514, in _true_getter
aggregation=aggregation)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/variable_scope.py", line 869, in _get_single_variable
(name, shape, found_var.get_shape()))
ValueError: Trying to share variable D/downconv_0/conv1d/kernel, but specified shape (25, 1, 64) and found shape (25, 2, 64).
--data_num_channels 1 works okay, though...
Here is the full command:
!export CUDA_VISIBLE_DEVICES="0"
!python train_wavegan.py train "../models/model_test" \
--data_dir "../datasets/dataset_test" \
--data_num_channels 2 \
--data_sample_rate 44100 \
--data_first_slice \
--data_slice_len 32768 \
--data_pad_end \
--data_fast_wav \
--wavegan_genr_pp
I've also tried tensorflow==1.12.0 but it is so outdated that requires CUDA 9...
Operating system: Debian 10
Current tensorflow-gpu: 1.14.0
Requirements.txt of my current pip list:
https://drive.google.com/file/d/1irXiAZyYHeUkNH-PYDCHjwbeaDfenCTv/view?usp=sharing
Please, help me out! How can I fix this small occasion?
I will be extremely thankful for any hint!
Hi Andy. Appreciate the kind words and sorry for the delay.
So are you saying that the exact same configuration (2ch) works on Google Colab, but not on another environment? That is indeed strange.
It looks like this is happening for the discriminator for the generated audio, while it appears that the placeholder for the real audio is indeed stereo. Can you check the shape of the G_z
tensor? Is it mono? If so, maybe there's some issue with this line of code due to changes to the tensorflow API since I wrote it: https://github.com/chrisdonahue/wavegan/blob/master/wavegan.py#L132