The ASG is not continuous,is this correct?
Closed this issue · 9 comments
When i analyze dataset,i found the ASG you give in json files is ont continuous. So i want to know it is correct?
Looking forward to your reply
I trained it on MSCOCO, and I found the json files covered all MSCOCO image files, but with different train/val/test split method. You can count the json file numbers to prove it.
I trained it on MSCOCO, and I found the json files covered all MSCOCO image files, but with different train/val/test split method. You can count the json file numbers to prove it.
You sure? The train/val/test images is 112,742/4,970/4,979 in paper. And the result of the normal split method is 113,287/5,000/5,000.
I am not sure now. I didn't count it carefully Orz. You are right, I'm sorry for my inaccuracy.
I trained it on MSCOCO, and I found the json files covered all MSCOCO image files, but with different train/val/test split method. You can count the json file numbers to prove it.
You sure? The train/val/test images is 112,742/4,970/4,979 in paper. And the result of the normal split method is 113,287/5,000/5,000.
We remove noisy ASGs in the automatically constructed dataset, so the size of dataset is smaller than that in normal split. You could check the supplementary material for details about how the removing is performed.
I trained it on MSCOCO, and I found the json files covered all MSCOCO image files, but with different train/val/test split method. You can count the json file numbers to prove it.
You sure? The train/val/test images is 112,742/4,970/4,979 in paper. And the result of the normal split method is 113,287/5,000/5,000.
We remove noisy ASGs in the automatically constructed dataset, so the size of dataset is smaller than that in normal split. You could check the supplementary material for details about how the removing is performed.
Thanks. But the discontinuous ASG is correct?
As the ASGs are automatically generated, there exist some noises as shown in our supplementary material.
Improve the quality of ASGs will benefit to the controllable captioning model, which could be our future work.
As the ASGs are automatically generated, there exist some noises as shown in our supplementary material.
Improve the quality of ASGs will benefit to the controllable captioning model, which could be our future work.
hello,i read the paper again. And i find two ways to generate ASG in your supplementary.
One is to use an off-the-shelf object proposal model to detect possible regions as object nodes.The attribute nodes then can be added arbitrarily. The relationship nodes use a binary relationship classifier. But the next chapter said utilize Stanford scene graph parser to get a scene graph,and then remove all semantic labels of nodes.
I want to know which of the ways should be used,or what is the relationship between the two ways?
Looking forward to your reply!
We use Stanford scene graph parser to get a scene graph from the caption (which contains object, relationship and attributes). Then we align the object nodes to the detected bounding boxes in the image.
We use Stanford scene graph parser to get a scene graph from the caption (which contains object, relationship and attributes). Then we align the object nodes to the detected bounding boxes in the image.
Thank you, I got it