ValueError: could not broadcast input array from shape (10,6,4,512) into shape (10,128)
anandhupvr opened this issue · 4 comments
anandhupvr commented
Hi, thanks for your great work.
While generating audio embedding from the code audio feature size is -- np.zeros([len_data, 10, 128])) but the result from vggish network (ie . shape of embedding_tensor) is (10,6,4,512)
for input audio i converted mp4 file into .wav and input_batch shape is (10, 96, 64)
Could you help me to run the script correctly for generating results for own video?
YapengTian commented
Hi!
Please change network in vggish_slim.py to:
# The VGG stack of alternating convolutions and max-pools.
net = slim.conv2d(net, 64, scope='conv1')
net = slim.max_pool2d(net, scope='pool1')
net = slim.conv2d(net, 128, scope='conv2')
net = slim.max_pool2d(net, scope='pool2')
net = slim.repeat(net, 2, slim.conv2d, 256, scope='conv3')
net = slim.max_pool2d(net, scope='pool3')
net = slim.repeat(net, 2, slim.conv2d, 512, scope='conv4')
net = slim.max_pool2d(net, scope='pool4')
# Flatten before entering fully-connected layers
net = slim.flatten(net)
net = slim.repeat(net, 2, slim.fully_connected, 4096, scope='fc1')
# The embedding layer.
net = slim.fully_connected(net, params.EMBEDDING_SIZE, scope='fc2')
return tf.identity(net, name='embedding')
anandhupvr commented
thanks
anandhupvr commented
thanks for the help and quick reply. if you have inference script for
custom video can you share it?
…On Thu, Sep 12, 2019 at 6:40 PM YapengTian ***@***.***> wrote:
Hi!
Please change network in vggish_slim.py to:
The VGG stack of alternating convolutions and max-pools.
net = slim.conv2d(net, 64, scope='conv1')
net = slim.max_pool2d(net, scope='pool1')
net = slim.conv2d(net, 128, scope='conv2')
net = slim.max_pool2d(net, scope='pool2')
net = slim.repeat(net, 2, slim.conv2d, 256, scope='conv3')
net = slim.max_pool2d(net, scope='pool3')
net = slim.repeat(net, 2, slim.conv2d, 512, scope='conv4')
net = slim.max_pool2d(net, scope='pool4')
# Flatten before entering fully-connected layers
net = slim.flatten(net)
net = slim.repeat(net, 2, slim.fully_connected, 4096, scope='fc1')
# The embedding layer.
net = slim.fully_connected(net, params.EMBEDDING_SIZE, scope='fc2')
return tf.identity(net, name='embedding')
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#5?email_source=notifications&email_token=AFPTHA2JF3Q3JRMBANEUVZDQJI5VNA5CNFSM4IWCTSHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6R2KMA#issuecomment-530818352>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFPTHA5ZI43N7SX573YYPGLQJI5VNANCNFSM4IWCTSHA>
.
YapengTian commented
No problem. I do not have other scripts for testing. It should be easy to modify my code to test other videos.