The inference doesn't work if I don't set the annotation clip to true

Question

The inference doesn't work if I don't set the annotation clip to true

Closed this issue a year ago · 1 comments

I assume that in some cases the whole video is relevant so you may not want to clip it and caption and segment the whole thing. But when I run the inference for this JSON it fails:
{
"How to clean a carpet": {
"Carpet.mp4": {
"relevant": true,
"clip": false,
"v_duration": 216.0,
"bounds": [
0,1
],
"steps": []
}
}
}

Answer 1 · 2023-08-25T23:39:40.000Z

Thanks for pointing this out!

To accomplish this, you can set clip to true and then set your bounds to be the full length of the video ([0, 216] in your case).

Then when you are running inference, you can remove the --task_moment_retrieval flag, and then it will segment/caption the whole video.

Please let me know if you have any other issues.