Preprocessing code for VIOLIN
Worm4047 opened this issue · 5 comments
Hi,
In the repo. create_txtdb.sh
is given to create txt DB for TVR. Can you please provide the script which you used to create text DB for violin?
Thanks
Hi there,
Thanks for your interests in this project. I have added the prepro function
Line 96 in e644834
To successfully run prepro and create txt_db
-
You will need to split the released VIOLIN annotation into train/val/test first and save them into jsonl files, similar to tvqa and tvr annotations. Below is an example of an entry in the resulting jsonl files:
{"vid_name": "BWMFLJwEVyQ_clip_000_040", "desc_id": "BWMFLJwEVyQ_clip_000_040-0-0", "desc": "The vampire grabbed the woman in the fur coat and bit her on the neck.", "label": true}
-
Subtitles in VIOLIN also needed to be formatted into similar jsonl file:
{"vid_name": "gt3ntYidpvs_clip_000_040", "sub": [{"text": "one board one minute home free okay make", "start": 0.12, "end": 11.629}, {"text": "it quick", "start": 11.639000000000001, "end": 21.97}, {"text": "you ready yeah wait what are you doing", "start": 21.98, "end": 26.23}, {"text": "show me superiority the senator dead may", "start": 26.240000000000002, "end": 28.990000000000002}, {"text": "drive them back sure sound like a call", "start": 29.0, "end": 40.0}]}
-
You also need an
vid2nframes.json
file, which I believe the id2nframe.json in violin video_db can be directly applied here. An example entry in the file:{"dh_s02e23_clip_1451_1476": 17, ...}
I believe step 1-3 can be easily done with just a little work. More descriptions about raw VIOLIN annotations are provided here, which may help you with formatting.
Let me know if you have any additional questions.
Thanks.
Hi,
I was able to create the text DB, but while running the training code I'm getting an error.
File "/src/data/data.py", line 59, in __init__ f'id2nframe.json', "r")) FileNotFoundError: [Errno 2] No such file or directory: '/video/violin/id2nframe.json
.
It seems that the video_db (downloaded) is missing this file.
There is id2nframe.json
in the downloaded video_db (the image above shows the output from extracting violin.tar
). Most probably the extracted files are stored under /video/VIDEO_DB/violin
due a mistake in the download script.
Did you pull the latest code? We have fixed the decompress command in the download script.....
Okay, let me check and get back to you.
Thanks
I was able to find the file on downloading the DB again.
Thanks