mismatch of index between the files img_urls_all.json and captioning_dataset.json
Opened this issue · 0 comments
pgaylumen commented
Hi and thanks for providing this repo,
maybe related to #6, it seems the files do not use the same indices:
urls = json.load(open('img_urls_all.json','r'))
capt = json.load(open('../../data/captioning_dataset.json','r'))
print(capt['58cedcaa95d0e02489b91f23']['images'])
{'0': '\nChuck Berry, shown in 1980, created a sound and style that made him rock ’n’ roll’s first true superstar.\n\n',
'4': '\nChuck Berry, shown in 2008, provided rock ’n’ roll with swagger, guitar chops and meticulous songwriting.\n\n'}
print(urls['58cedcaa95d0e02489b91f23'])
{'0': 'https://static01.nyt.com/images/2017/03/20/arts/20BERRY-APPRAISAL/20BERRY-APPRAISAL-articleInline.jpg?quality=90&auto=webp',
'3': 'https://static01.nyt.com/images/2017/03/20/arts/20BERRY-APPRAISALJP/20BERRY-APPRAISALJP-articleInline.jpg?quality=90&auto=webp'}
It seems to be only an index issue as the images correspond to the text.