soCzech/GenHowTo

How to improve the quality of released dataset

Closed this issue · 2 comments

I observe that there are many low-quality pairs in the dataset that is published, such as pairs with significant differences between the initial and final frames, or where the initial frame is the beginning of a video (with no substantive content). These instances are likely due to training noise. Are there any further filtering methods available?

Additionally, each video in ChangeIt contains multiple shots. In this case, how can we avoid the model from treating the contents of different shots as collected into one pair? Are there more granular annotations available in ChangeIt (such as distinguishing each shot), or do the models provided by ChangeIt have this capability? Alternatively, can the dataset be pre-clipped using other methods?

Looking forward to your response, thank you very much.

We did not use any additional filtering. But using any such filtering could improve the results.

As for the shots: We did not handle the shots in any special way. Beware that in many videos, any single shot may not contain objects in all states. For example, this is the case if multiple cameras are used to film the action and the resulting video is composed of interleaved shots from different cameras.

ChangeIt dataset is provided without any manual annotation (except for the small test set). Use any method for clipping or extracting frames as you wish :)

Thanks a lot!