Hello,can you tell me the order of use for vqa_v2_pretrain files?
Closed this issue · 69 comments
Thank you very much for publishing such a great work!
I see that in the directory, I just need tsv2feature.py or tsv2Feature_objects. py to get vqa_img_feature_train.pickle.
Is there any other code in the vqa_v2_pretrain file that needs to be used after tsv2Feature_objects.py or tsv2Feature_objects.py is run, or in what order?
Wish you all the best!
Just tsv2feature.py
or tsv2Feature_objects.py
is ok, other files are used to generate vocabulary or processed annotation, I have provided vocabulary files or other files in google drive. I recommend using the tsv2Feature_objects.py
since the corresponding features contains objects' label for each image and features in the original bottom-up attention's repo do not contain.
Thank you very much for your reply and wish you all the best
I'm very sorry to disturb you.
"model = LxmertModel.from_pretrained('./premodel', config=config)"in model.py,if i python train.py --embedding --model_dir model_save_dir --dataset okvqa --pretrain --accumulate --validate
What files should I put in the"./premodel" folder?
Wish you all the best!
if there is an error output, model = LxmertModel.from_pretrained('unc-nlp/lxmert-base-uncased', config=config)
should work
I'm so sorry to disturb you.
I'm running on two gpus of 3090 and I got this error, do you know why?
2022-04-12 23:24:07.311082: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
len(qids): 209126
5046
Let's use 2 GPUs!
0%| | 0/817 [00:02<?, ?it/s]
Traceback (most recent call last):
File "train.py", line 334, in
train()
File "train.py", line 178, in train
for batch_data in tqdm(train_dataloader):
File "/home/anaconda3/envs/mmukea/lib/python3.8/site-packages/tqdm/std.py", line 1195, in iter
for obj in iterable:
File "/home/anaconda3/envs/mmukea/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 435, in next
data = self._next_data()
File "/home/anaconda3/envs/mmukea/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1085, in _next_data
return self._process_data(data)
File "/home/anaconda3/envs/mmukea/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1111, in _process_data
data.reraise()
File "/home/anaconda3/envs/mmukea/lib/python3.8/site-packages/torch/_utils.py", line 428, in reraise
raise self.exc_type(msg)
KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/anaconda3/envs/mmukea/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop
data = fetcher.fetch(index)
File "/home/anaconda3/envs/mmukea/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/anaconda3/envs/mmukea/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/MMuKEA/MuKEA/dataset.py", line 177, in getitem
image_feature = pretrain_feature[self.image_ids[index]]['feats']
KeyError: '445305'
It seems you get the wrong image feature.
run the following code in console
with open('/data2/yjgroup/dy/kb-vqa/data/vqa_img_feature_train.pickle', 'rb') as f:
pretrain_feature = pickle.load(f)
the pretrain_feature should contain features of 82,783 images, check if you get the right tsv2feature.py
I use tsv2feature_objects.py,
because i get "trainval2014_resnet101_faster_rcnn_genome_36.tsv" from"The image features are provided by and downloaded from the original bottom-up attention' repo,".
So i use tsv2feature_objects.py to get "vqa_img_feature_train.pickle".
But now, I run train.py.It made an error.
Thank you very much for your reply and wish you all the best!
tsv2feature_objects.py
do not corresponds trainval2014_resnet101_faster_rcnn_genome_36.tsv
, I'm little confused ,if you mean https://github.com/airsplay/lxmert#google-drive ?
I use tsv2feature_objects.py to get "vqa_img_feature_train.pickle".when i run train.py,it make that error.
tsv2feature_objects.py corresponds "train2014_obj36.tsv"
tsv2feature.py corresponds "trainval2014_resnet101_faster_rcnn_genome_36.tsv"
So i want to use tsv2feature.py,I downloaded "trainval2014_resnet101_faster_rcnn_genome_36.tsv". This is not "train2014_resnet101_faster_rcnn_genome_36.tsv" and "val2014_resnet101_faster_rcnn_genome_36.tsv"
you may need to run git pull
to update the code, I have fixed this bug.
Thank you very much for your reply.
Wish you all the best!
I'm very sorry to disturb you.
There is a bug in tsv2feature_objects.py when processing the image id, I have fixed it, thanks for your reply. it may work normally now.
感谢大佬的帮助!祝您以后工作一切如意,心想事成,万事顺利!
I'm very sorry to disturb you.
For krvqa,in dataset.py(line14).For "if args.pretrain: "vqa_train_filter.json" and "vqa_img_feature_train.pickle".Are they the same as okvqa?
""""
if args.dataset == 'krvqa':
if args.pretrain:
with open('data/vqa_train_filter.json','r') as f:
vqa2 = json.load(f)
train_row = vqa2
with open('data/vqa_img_feature_train.pickle', 'rb') as f:
pretrain_feature = pickle.load(f)
else:
with open('data/kr-vqa/krvqa_img_feature_train.pickle', 'rb') as f:
pretrain_feature = pickle.load(f)
with open('data/kr-vqa/krvqa_train.json','r') as f:
train_row = json.load(f)
if args.accumulate:
with open('data/krvqa-pretrain_dic_all_filter.pickle', 'rb') as f:
a_dic = pickle.load(f)
else:
with open('data/kr-vqa/krvqa-ans_dic.pickle', 'rb') as f:
a_dic = pickle.load(f)
"""
Wish you all the best!
right, it‘s the same
I'm so sorry to disturb you.
For krvqa,in dataset_val.py(line16),"if args.dataset == 'krvqa':",I did not find "krvqa_img_feature_test.pickle" on google drive, does it need "prepare_img.py"to be generated?
""
if args.dataset == 'krvqa':
with open('data/kr-vqa/krvqa_test.json','r') as f:
val_row = json.load(f)
with open('data/kr-vqa/krvqa_img_feature_test.pickle', 'rb') as f:
pretrain_feature = pickle.load(f)
""
Wish you all the best!
I will also try to upload the feature to google drive, it may take a few hours.
真的是太感谢大佬的帮助,甚至想要签名啊,感谢!
没事,你是96的吗,那我还比你小,大佬真的算不上:joy:
是的,真是惭愧,大佬不分年龄!
我刚刚上传了krvqa的图像特征
感谢大佬的帮助!祝您工作顺利,万事顺遂!
tsv2feature_objects.py在处理图片id的时候有个bug,我已经修复了,谢谢你的回复。现在可以正常工作了。
您好,按照您上面提示的修改过文件,现在运行train文件还是有KeyError?,错误如下:
KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/stormai/anaconda3/envs/Mukea/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop
data = fetcher.fetch(index)
File "/home/stormai/anaconda3/envs/Mukea/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/stormai/anaconda3/envs/Mukea/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/stormai/userfile/wangfengjuan/MuKEA-main/dataset.py", line 137, in getitem
image_feature = pretrain_feature[self.image_ids[index]]['feats']
KeyError: '291764'
with open('vqa_img_feature_train.pickle', 'rb') as f:
pretrain_feature = pickle.load(f)
print(pretrain_feature.keys())
方便在命令行运行一下这个代码看下结果吗,我不太清楚为啥还会出现keyerror
您好,运行这个代码文件以后,结果里面有很多ID,而且ID里面有291764这个值。类似下面这些
dict_keys(['132417', '573195', '578174', '494628', '207647', '84002', '306661', '387267', '131486', '201446', '384612', '152557', '186626', '381400', '366470', '233520', '216856', '573456', '406366', '27826', '441449', '356972', '40201', '54696', '464593', '438297', '492078', '356755', '397219', '554168', '297664', '182406', '288984', '355321', '77099', '412468', '189882', '186844', '272647', '400477', '485306', '215763', '220688', '425135', '426470', '288119', '128599', '182436', '158786', '200010', '474545', '202447', '120615', '50863', '2093', '60035', '105817', '131952', '114333', '526794', '147178', '37015', '276610', '577107', '356011', '36090', '394480', '562498', '205887', '398649', '198561
那就应该不会报错才对啊
大佬,
对于okvqa数据集,预训练过程中,多少epoch可以达到模型收敛呢?那krvqa呢?
微调过程中,两个数据集收敛都需要在epoch设定为200,就可以了吗?
非常感谢作者!祝您一切顺利!
预训练阶段一般在20个epoch左右就收敛了,微调OKVQA数据集按照我的经验需要200个epoch,KRVQA大概也在20个epoch左右,可以看实验时的具体情况
非常感谢大佬带给我们这么好的文章和代码!
😂 loss下降和准确率上升不是一个正常的现象吗,我不是很懂为什么会有问题
那是不是说这个预训练,对于我这个现在的实验条件而言,可能还得继续进行预训练啊,我蒙住了,不好意思啊
按照我的理解就是现在已经收敛到了27%左右,其实不太需要继续训练了,我一般是训练30个epoch左右就认为训练完成了,当然我没有设置固定的随机种子,也不排除有波动的可能
感谢,大佬作者及时的回复!
尊敬的作者您好:
head_300 = self.linear_300(head) in model.py,self.linear_300 = nn.Sequential(nn.Linear(768, 1024), nn.ReLU(), nn.Linear(1024, 300)),为什么最后一个通道数设置成300呢?
感谢,祝您一切顺利!
因为根据我看论文的经验,表示学习最后的维度一般在50-300之间,这里就设置成了300,维度的影响应该不是很大
尊敬的作者您好:
请问在krvqa数据集上,也是预训练阶段学习率1e-5,微调1e-4吗,大概在多少epoch左右,krvqa预训练收敛啊
感谢作者,祝您一切顺利!
尊敬的作者您好: 请问在krvqa数据集上,也是预训练阶段学习率1e-5,微调1e-4吗,大概在多少epoch左右,krvqa预训练收敛啊 感谢作者,祝您一切顺利!
krvqa数据集预训练和微调我用的都是1e-4的学习率, 4-5个epoch就可以收敛
那krvqa数据集预训练收敛时,验证集准确率大概是多少呢,因为我得出的结果有点低,预训练阶段krvqa,验证集准确率只有7%,感觉我是不出错了,所以问了下您,麻烦您拉,非常感谢!
没问题,因为KRVQA数据集跟VQAv2差异比较大,而且难度也比较高,预训练准确率只有7%左右
非常感谢您及时的回复!
如果,对于krvqa,不使用预训练,那也能得到27%左右的结果吗,您尝试过吗,我曾尝试了不适用预训练,对于krvqa,但是只得到了26%左右的准确率,这是正常的吗?感谢作者,祝您一切顺利,万事顺遂!
我没尝试过,因为消融实验是在OKVQA上做的,但是正常不加预训练准确率肯定会低一些,毕竟数据越多泛化性是越强的
请问krvqa,微调阶段,需要多少epoch啊,非常感谢!
4-10个epoch左右
我没有复现出27%的结果呢,有什么需要注意的吗,呜呜呜
正常来说跟OKVQA的训练流程是一样的,如果预训练阶段准确率一样,微调阶段应该不会差太多才对,你复现的结果是多少
微调用的是预训练阶段第几个epoch的模型呢,具体原因我暂时也没有想到
微调用的预训练中第5个epoch,呜呜呜
我先检查一下词表吧,但是可能得晚一些。而且这个准确率变化看起来有点奇怪,我训练的时候准确率是平稳上升的,但是你发的图里准确率一直在上下波动
谢谢大神小哥哥,呜呜呜,您觉得可能和什么相关呢,我一直在找原因,呜呜
你可以检查一下词表文件是不是包括训练集所有的答案,可能我传错了词表,如果没问题的话试一下预训练用1e-4的学习率
好的,谢谢神仙小哥哥,预训练,微调,我都使用的是1e-4呢
如果词表没问题的话就试一下1e-5吧,okvqa数据集学习率太大也会出现波动的情况,或者用前面几个epoch的模型微调,预训练epoch太长好像会损害微调的效果
就是对于krvqa,我没有用预训练,还得到了26%的准确率,用了预训练25%左右,我不知道错在哪了,呜呜呜,我查看下词表中所有单词数量
大神,对于krvqa,如果if args.accumulate: krvqa-pretrain_dic_all_filter.pickle 词数量:22875, else: krvqa-ans_dic.pickle:词数量是5246,您看这个数量对吗,感谢作者
是没问题的,你用小学习率预训练试一下吧
您好,大神,对于OKVQA,pretrain_dic_all_filter.pickle词表数量是146348(if args.accumulate),ans_dic.pickle词表数量是11508(not accumulate),两者差异(146348-11508=134,840)的来自pretrain_dic_all_filter.pickle中添加了预训练数据集VQAv2的部分词数量吗?
vqav2_dic_all.pickle词表数量是219875(VQAv2),我在计算ans_dic.pickle和vqav2_dic_all.pickle的交集是7247个,219875-7247=212,628不等于146348啊,那pretrain_dic_all_filter.pickle的组成是怎么来的呢?
感谢作者!!祝作者万事顺遂,一切顺利!
vqav2_dic_all.pickle包括了vqa训练集和验证集中的所有答案用于在VQAv2数据集上进行单数据集的实验,pretrain_dic_all_filter.pickle增加的词为vqav2_filter.json中的答案,是过滤后的vqa训练集(去掉了yes/no 和 how many 相关的问题)
大神,我查询到vqa_train_filter.json中含有209126个样本,vqa_train.json中含有443757个样本,vqa_val.json中含有214354个样本,vqa_train.json+vqa_val.json后是658111个样本,除去'yes' ,'no' , 'How many' 后,我查询到310366个样本啊,大于“vqa_train_filter.json中含有209126个样本”,是还除去其他问题了吗,还是我这里查询出错了呢?
非常感谢!
vqa_train_filter.json不包括验证集
可是,我对vqa_train.json进行过滤后(除去'yes' ,'no' , 'How many')得到的是209264个样本,可我查询vqa_train_filter.json中含有209126个样本,那138个样本,是怎么去掉的呢?
感谢大神!
我当初过滤时还添加了where的选项,但是后来又删掉了,应该是中间处理数据时我还用了以前错误的代码导致多过滤了部分数据,但是138个样本(千分之一的级别)的增减应该对结果影响不大。后续有时间我再检查更正一下
大神,我刚刚又过滤了where,发现是209126个样本啦,和vqa_train_filter.json样本量,一样啦,非常感谢您!
大神,if args.accumulate: krvqa-pretrain_dic_all_filter.pickle 词数量:22875, else: krvqa-ans_dic.pickle:词数量是5246,两者差异,也来自于vqa_train_filter.json吗(除去yes/no,where,how many),因为我刚刚试了下,把vqa_train_filter.json的词表和krvqa-ans_dic.pickle词表合并,得到的是143519,并不是22875,是还去掉什么其他类型的样本了吗,
非常感谢!
神仙小哥哥,krvqa-pretrain_dic_all_filter.pickle的组成是什么呢,除了KRVQA本身的训练数据的答案,
非常感谢作者啦
应该是因为我在生成字典的时候没有使用multi-answer,只用了answer,因为KRVQA没有提供multi-answer就保持了一致的处理方法