request for a extra noisy visual feature package

Question

request for a extra noisy visual feature package

tymanman opened this issue 5 years ago · 19 comments

I have downloaded the package of audio and visual feature,but I can't find the visual_feature_noisy.h5,which is applied in your script for weakly_supervised training.So how can I produce it by myself,please.Thanks.

Answer 1 · 2019-11-14T08:12:02.000Z

Hi!

Please kindly check the ReadMe where you can find the link: https://drive.google.com/file/d/1I3OtOHJ8G1-v5G2dHIGCfevHQPn-QyLh/view for the feature.

Answer 2 · 2019-11-14T09:49:12.000Z

Thank you.I'm so careless to ignore that.

Answer 3 · 2020-08-07T08:40:42.000Z

@YapengTian Hi, Yapeng, I met a OutIndex error when I try to run visual_feature_extractor.py,
I think it relates to this line, frame_interval = int(vid_len / t),
the frame_interval is so huge? It's weird.
This will make the error when run the following code for n in frame_num, so could you please check with this?

Answer 4 · 2020-08-08T00:18:03.000Z

Hi！The code is written for the AVE dataset. We want to process 10-sec long videos and extract 16 frames for each second.

Here, you mention you got a very huge frame_interval. I guess you are not using the AVE dataset and processing the other datasets. So you need to change line36:"t = 10 # length of video", since your video is larger than 10s. If videos in your dataset have dynamic lengths. You should put the t = video_len inside the for loop.

Answer 5 · 2020-08-08T00:40:48.000Z

@YapengTian Acutually, I did use the code to process AVE dataset. Also the vid_len is a very huge value, so do frame_interval,

and the len(imgs) just equals 160,

so when run the following for loop, there is an error?

Answer 6 · 2020-08-08T01:21:47.000Z

Could you print frame_num and len(imgs) to see why it happens? It is pretty weird.

Answer 7 · 2020-08-08T01:32:21.000Z

frame_num[-1] = int(l * frame_interval + (i * 1.0 / sample_num) * frame_interval) = int(9/10 + (15/16)*(1/10))*vide_len < vid_len. and len(imgs) = vide_len. It means that n < vide_len and there should be no the out of range issue.

Answer 8 · 2020-08-08T01:36:53.000Z

@YapengTian
In my machine, it likes this,

vid_len: 9223372036854775807

frame_interval: 922337203685477632

frame_num: [0, 57646075230342352, 115292150460684704, 172938225691027072, 230584300921369408, 288230376151711744, 345876451382054144, 403522526612396480, 461168601842738816, 518814677073081152, 576460752303423488, 634106827533765888, 691752902764108288, 749398977994450560, 807045053224792960, 864691128455135232, 922337203685477632, 979983278915820032, 1037629354146162304, 1095275429376504704, 1152921504606846976, 1210567579837189376, 1268213655067531776, 1325859730297874176, 1383505805528216576, 1441151880758558720, 1498797955988901120, 1556444031219243520, 1614090106449585920, 1671736181679928320, 1729382256910270464, 1787028332140612864, 1844674407370955264, 1902320482601297664, 1959966557831640064, 2017612633061982208, 2075258708292324608, 2132904783522667008, 2190550858753009408, 2248196933983351808, 2305843009213693952, 2363489084444036608, 2421135159674378752, 2478781234904721408, 2536427310135063552, 2594073385365405696, 2651719460595748352, 2709365535826090496, 2767011611056433152, 2824657686286775296, 2882303761517117952, 2939949836747460096, 2997595911977802752, 3055241987208144896, 3112888062438487040, 3170534137668829696, 3228180212899171840, 3285826288129514496, 3343472363359856640, 3401118438590198784, 3458764513820541440, 3516410589050883584, 3574056664281226240, 3631702739511568384, 3689348814741910528, 3746994889972252672, 3804640965202595328, 3862287040432937472, 3919933115663280128, 3977579190893622272, 4035225266123964416, 4092871341354307072, 4150517416584649216, 4208163491814991872, 4265809567045334016, 4323455642275676160, 4381101717506018816, 4438747792736360960, 4496393867966703616, 4554039943197045760, 4611686018427387904, 4669332093657730048, 4726978168888072192, 4784624244118415360, 4842270319348757504, 4899916394579099648, 4957562469809441792, 5015208545039783936, 5072854620270127104, 5130500695500469248, 5188146770730811392, 5245792845961153536, 5303438921191495680, 5361084996421838848, 5418731071652180992, 5476377146882523136, 5534023222112866304, 5591669297343208448, 5649315372573550592, 5706961447803893760, 5764607523034235904, 5822253598264578048, 5879899673494920192, 5937545748725262336, 5995191823955605504, 6052837899185947648, 6110483974416289792, 6168130049646631936, 6225776124876974080, 6283422200107317248, 6341068275337659392, 6398714350568001536, 6456360425798343680, 6514006501028685824, 6571652576259027968, 6629298651489371136, 6686944726719713280, 6744590801950055424, 6802236877180397568, 6859882952410739712, 6917529027641082880, 6975175102871425024, 7032821178101767168, 7090467253332109312, 7148113328562452480, 7205759403792794624, 7263405479023136768, 7321051554253478912, 7378697629483821056, 7436343704714163200, 7493989779944505344, 7551635855174848512, 7609281930405190656, 7666928005635532800, 7724574080865874944, 7782220156096217088, 7839866231326560256, 7897512306556902400, 7955158381787244544, 8012804457017586688, 8070450532247928832, 8128096607478272000, 8185742682708614144, 8243388757938956288, 8301034833169298432, 8358680908399640576, 8416326983629982720, 8473973058860325888, 8531619134090668032, 8589265209321010176, 8646911284551352320, 8704557359781694464, 8762203435012037632, 8819849510242379776, 8877495585472721920, 8935141660703064064, 8992787735933407232, 9050433811163749376, 9108079886394091520, 9165725961624433664]

len(imgs) 251

Answer 9 · 2020-08-08T01:38:13.000Z

It seems that the vid = imageio.get_reader(video_index, 'ffmpeg') vid_len = len(vid) gives wrong values......

Answer 10 · 2020-08-08T01:39:20.000Z

The issue from the vid_len = len(vid) and len(imgs) looks correct. could you let vid_len = len(imgs) and run the code.

Answer 11 · 2020-08-08T01:40:47.000Z

vid_len = len(imgs)
frame_interval = int(vid_len / t)
frame_num = video_frame_sample(frame_interval, t, sample_num)
first get imgs then get frame_num after that.

Answer 12 · 2020-08-08T01:41:20.000Z

Probably imagio version issue.

Answer 13 · 2020-08-08T01:46:53.000Z

okay, maybe, thanks a lot. I think it's correct now.

vid_len: 251
frame_interval: 25
frame_num: [0, 1, 3, 4, 6, 7, 9, 10, 12, 14, 15, 17, 18, 20, 21, 23, 25, 26, 28, 29, 31, 32, 34, 35, 37, 39, 40, 42, 43, 45, 46, 48, 50, 51, 53, 54, 56, 57, 59, 60, 62, 64, 65, 67, 68, 70, 71, 73, 75, 76, 78, 79, 81, 82, 84, 85, 87, 89, 90, 92, 93, 95, 96, 98, 100, 101, 103, 104, 106, 107, 109, 110, 112, 114, 115, 117, 118, 120, 121, 123, 125, 126, 128, 129, 131, 132, 134, 135, 137, 139, 140, 142, 143, 145, 146, 148, 150, 151, 153, 154, 156, 157, 159, 160, 162, 164, 165, 167, 168, 170, 171, 173, 175, 176, 178, 179, 181, 182, 184, 185, 187, 189, 190, 192, 193, 195, 196, 198, 200, 201, 203, 204, 206, 207, 209, 210, 212, 214, 215, 217, 218, 220, 221, 223, 225, 226, 228, 229, 231, 232, 234, 235, 237, 239, 240, 242, 243, 245, 246, 248]
len(frame_num): 160
len(imgs) 251

Answer 14 · 2020-08-08T01:56:25.000Z

@YapengTian Hi Yapeng, btw, have you tried to use ResNet to extract the visual feature? I saw a paper dual-modality seq2seq network for audio-visual event localization , which adopts ResNet-152 to extract feature, and just use method in your paper, the final accuracy can be improved a lot. In supervised setting, it can improve about 3 points while about 6 points in weakly supervised setting.
It's amazing, but I try to use these codes for extracting features, just use ResNet152 instead of VGG19, the results seems worse, could you please give some suggestions?

Answer 15 · 2020-08-08T05:15:18.000Z

I only used VGG features as shared in the repo. It is pretty weird. Res152 features should be better.

base_model = VGG19(weights='imagenet')
model = Model(inputs=base_model.input, outputs=base_model.get_layer('block5_pool').output)

When using resnet152, you need to make sure that block5_pool is still the final layer before global pooling and FC layer and video_features = np.zeros([len_data, 10, 7, 7, 512]) might change to video_features = np.zeros([len_data, 10, 7, 7, 2048]).

Answer 16 · 2020-08-08T06:50:16.000Z

okay, very appreciated for the discussion, nice.

Answer 17 · 2021-12-20T03:38:01.000Z

okay, maybe, thanks a lot. I think it's correct now.

vid_len: 251 frame_interval: 25 frame_num: [0, 1, 3, 4, 6, 7, 9, 10, 12, 14, 15, 17, 18, 20, 21, 23, 25, 26, 28, 29, 31, 32, 34, 35, 37, 39, 40, 42, 43, 45, 46, 48, 50, 51, 53, 54, 56, 57, 59, 60, 62, 64, 65, 67, 68, 70, 71, 73, 75, 76, 78, 79, 81, 82, 84, 85, 87, 89, 90, 92, 93, 95, 96, 98, 100, 101, 103, 104, 106, 107, 109, 110, 112, 114, 115, 117, 118, 120, 121, 123, 125, 126, 128, 129, 131, 132, 134, 135, 137, 139, 140, 142, 143, 145, 146, 148, 150, 151, 153, 154, 156, 157, 159, 160, 162, 164, 165, 167, 168, 170, 171, 173, 175, 176, 178, 179, 181, 182, 184, 185, 187, 189, 190, 192, 193, 195, 196, 198, 200, 201, 203, 204, 206, 207, 209, 210, 212, 214, 215, 217, 218, 220, 221, 223, 225, 226, 228, 229, 231, 232, 234, 235, 237, 239, 240, 242, 243, 245, 246, 248] len(frame_num): 160 len(imgs) 251

hello, How did you solve this problem, I encountered the same problem, is it caused by the wrong version of imageio?

Answer 18 · 2022-01-23T08:13:07.000Z

okay, maybe, thanks a lot. I think it's correct now.

vid_len: 251 frame_interval: 25 frame_num: [0, 1, 3, 4, 6, 7, 9, 10, 12, 14, 15, 17, 18, 20, 21, 23, 25, 26, 28, 29, 31, 32, 34, 35, 37, 39, 40, 42, 43, 45, 46, 48, 50, 51, 53, 54, 56, 57, 59, 60, 62, 64, 65, 67, 68, 70, 71, 73, 75, 76, 78, 79, 81, 82, 84, 85, 87, 89, 90, 92, 93, 95, 96, 98, 100, 101, 103, 104, 106, 107, 109, 110, 112, 114, 115, 117, 118, 120, 121, 123, 125, 126, 128, 129, 131, 132, 134, 135, 137, 139, 140, 142, 143, 145, 146, 148, 150, 151, 153, 154, 156, 157, 159, 160, 162, 164, 165, 167, 168, 170, 171, 173, 175, 176, 178, 179, 181, 182, 184, 185, 187, 189, 190, 192, 193, 195, 196, 198, 200, 201, 203, 204, 206, 207, 209, 210, 212, 214, 215, 217, 218, 220, 221, 223, 225, 226, 228, 229, 231, 232, 234, 235, 237, 239, 240, 242, 243, 245, 246, 248] len(frame_num): 160 len(imgs) 251

hi how to fix this problem? the version of imageio?

Answer 19 · 2022-01-23T08:13:33.000Z

okay, maybe, thanks a lot. I think it's correct now.
vid_len: 251 frame_interval: 25 frame_num: [0, 1, 3, 4, 6, 7, 9, 10, 12, 14, 15, 17, 18, 20, 21, 23, 25, 26, 28, 29, 31, 32, 34, 35, 37, 39, 40, 42, 43, 45, 46, 48, 50, 51, 53, 54, 56, 57, 59, 60, 62, 64, 65, 67, 68, 70, 71, 73, 75, 76, 78, 79, 81, 82, 84, 85, 87, 89, 90, 92, 93, 95, 96, 98, 100, 101, 103, 104, 106, 107, 109, 110, 112, 114, 115, 117, 118, 120, 121, 123, 125, 126, 128, 129, 131, 132, 134, 135, 137, 139, 140, 142, 143, 145, 146, 148, 150, 151, 153, 154, 156, 157, 159, 160, 162, 164, 165, 167, 168, 170, 171, 173, 175, 176, 178, 179, 181, 182, 184, 185, 187, 189, 190, 192, 193, 195, 196, 198, 200, 201, 203, 204, 206, 207, 209, 210, 212, 214, 215, 217, 218, 220, 221, 223, 225, 226, 228, 229, 231, 232, 234, 235, 237, 239, 240, 242, 243, 245, 246, 248] len(frame_num): 160 len(imgs) 251

hello, How did you solve this problem, I encountered the same problem, is it caused by the wrong version of imageio?

have you solved it?