yyang181/NTIRE23-VIDEO-COLORIZATION

New "features" are not properly colored.

dan64 opened this issue · 1 comments

I made some tests on your model, but I discovered that the new "features" are not properly colored.

The_Thing_1951_test

As you can see the colored frame 0000 is very near to the reference frame (0000_ref) in the next frames the colors are propagated very well. The problem happen when in the sequence are introduced new elements not available in the reference frame. In this case are the hands of the woman (see frame 20). In the paper there is written that in this case it will be used the color of the new "feature" (in this case the hands) recovered from the trained network. But as it is possible to see this not the case since the color assigned to the hands is the same as the sweater in the background.

DeOldify has not this kind of problems (see frame 20 colored with DeOldify+DDColor):

DD_simple_havc_15162

It is possible to improve the quality of colors of "features" not available in the reference frames ?

Dan

Hi @dan64, thanks so much for your interest in our work.

I would like to clarify the reasoning behind these results. Reference-based methods tend to emphasize propagating existing elements across consecutive frames rather than focusing on the colorization of new features, such as hands that are not present in the exemplar. This distinction is critical for image colorization methods. Additionally, DDColor benefits from pretraining on extensive datasets like ImageNet (1.3M images), COCO-Stuff, and ADE20K, whereas our BiSTNet was trained on smaller datasets like DAVIS and Videvo. Consequently, it may not perform optimally on unseen features.

A potential approach to mitigate this limitation is to use Deoldify or DDColor to colorize a single frame that contains the most elements and then employ this frame as an exemplar for colorizing the entire video.

Moreover, our BiSTNet method suggests utilizing two exemplars to address real-world scenarios more effectively. We recommend trying to colorize the first and the last frames of a video as exemplars for improved results.

Hope this would help.