j-min/HiREST

The inference doesn't work if I don't set the annotation clip to true

Closed this issue · 1 comments

I assume that in some cases the whole video is relevant so you may not want to clip it and caption and segment the whole thing. But when I run the inference for this JSON it fails:
{
"How to clean a carpet": {
"Carpet.mp4": {
"relevant": true,
"clip": false,
"v_duration": 216.0,
"bounds": [
0,1
],
"steps": []
}
}
}

aszala commented

Thanks for pointing this out!

To accomplish this, you can set clip to true and then set your bounds to be the full length of the video ([0, 216] in your case).

Then when you are running inference, you can remove the --task_moment_retrieval flag, and then it will segment/caption the whole video.

Please let me know if you have any other issues.