ValueError: could not broadcast input array from shape (10,6,4,512) into shape (10,128)

Question

ValueError: could not broadcast input array from shape (10,6,4,512) into shape (10,128)

anandhupvr opened this issue 5 years ago · 4 comments

Hi, thanks for your great work.
While generating audio embedding from the code audio feature size is -- np.zeros([len_data, 10, 128])) but the result from vggish network (ie . shape of embedding_tensor) is (10,6,4,512)

for input audio i converted mp4 file into .wav and input_batch shape is (10, 96, 64)

Could you help me to run the script correctly for generating results for own video?

anandhupvr commented 5 years ago

thanks

Answer 1 · 2019-09-12T13:10:46.000Z

Hi！

Please change network in vggish_slim.py to:

# The VGG stack of alternating convolutions and max-pools.
net = slim.conv2d(net, 64, scope='conv1')
net = slim.max_pool2d(net, scope='pool1')
net = slim.conv2d(net, 128, scope='conv2')
net = slim.max_pool2d(net, scope='pool2')
net = slim.repeat(net, 2, slim.conv2d, 256, scope='conv3')
net = slim.max_pool2d(net, scope='pool3')
net = slim.repeat(net, 2, slim.conv2d, 512, scope='conv4')
net = slim.max_pool2d(net, scope='pool4')
# Flatten before entering fully-connected layers
net = slim.flatten(net)
net = slim.repeat(net, 2, slim.fully_connected, 4096, scope='fc1')
# The embedding layer.
net = slim.fully_connected(net, params.EMBEDDING_SIZE, scope='fc2')
return tf.identity(net, name='embedding')

Answer 2 · 2019-09-12T14:07:40.000Z

thanks for the help and quick reply. if you have inference script for custom video can you share it?

…

On Thu, Sep 12, 2019 at 6:40 PM YapengTian ***@***.***> wrote: Hi！ Please change network in vggish_slim.py to: The VGG stack of alternating convolutions and max-pools. net = slim.conv2d(net, 64, scope='conv1') net = slim.max_pool2d(net, scope='pool1') net = slim.conv2d(net, 128, scope='conv2') net = slim.max_pool2d(net, scope='pool2') net = slim.repeat(net, 2, slim.conv2d, 256, scope='conv3') net = slim.max_pool2d(net, scope='pool3') net = slim.repeat(net, 2, slim.conv2d, 512, scope='conv4') net = slim.max_pool2d(net, scope='pool4') # Flatten before entering fully-connected layers net = slim.flatten(net) net = slim.repeat(net, 2, slim.fully_connected, 4096, scope='fc1') # The embedding layer. net = slim.fully_connected(net, params.EMBEDDING_SIZE, scope='fc2') return tf.identity(net, name='embedding') — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#5?email_source=notifications&email_token=AFPTHA2JF3Q3JRMBANEUVZDQJI5VNA5CNFSM4IWCTSHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6R2KMA#issuecomment-530818352>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFPTHA5ZI43N7SX573YYPGLQJI5VNANCNFSM4IWCTSHA> .

Answer 3 · 2019-09-12T14:36:53.000Z

No problem. I do not have other scripts for testing. It should be easy to modify my code to test other videos.