The inference doesn't work if I don't set the annotation clip to true
Closed this issue · 1 comments
IgnacioSan22 commented
I assume that in some cases the whole video is relevant so you may not want to clip it and caption and segment the whole thing. But when I run the inference for this JSON it fails:
{
"How to clean a carpet": {
"Carpet.mp4": {
"relevant": true,
"clip": false,
"v_duration": 216.0,
"bounds": [
0,1
],
"steps": []
}
}
}
aszala commented
Thanks for pointing this out!
To accomplish this, you can set clip
to true
and then set your bounds
to be the full length of the video ([0, 216]
in your case).
Then when you are running inference, you can remove the --task_moment_retrieval
flag, and then it will segment/caption the whole video.
Please let me know if you have any other issues.