ChenyangLEI/deep-video-prior

About the reason why IRT work

07hyx06 opened this issue · 7 comments

Thanks for your great work.

I still cannot understand why IRT work after reading the paper. Can you provide more insights?

You can compare IRT with K-means where K=2.

  1. There are two outputs just like two clusters.
  2. In each iteration, the two outputs are optimized with different pixels that are closer to themseleves, which just like the clustering step. And they get the new output (just like a new center) by the pixels with their model (cluster)

Thanks for your reply!
What about the final state after the whole IRT training process? Would the confidence map become an all-ones mask?

I wonder if the confidence map is not an all-ones mask in the final state, the main-model output would contain some pixels that belong to the minor-model.

Yes, the confidence map finally becomes a quite stable mask, but it is not an all-one mask. The confidence map denotes the confidence between the main output and processed frames; since processed frames usually contain pixels in minor-modes, the confidence map will not be an all-one mask (if the network converges correctly).

In practice, we can use a specific frame (e.g., the first frame) to train the network for the main mode
at the beginning of training.

Does it mean that train the network using a specific frame in the first few iterations and then train the network on the whole frame in the video?

Last question. If the main and minor model output are all very close to the processed frame, the IRT loss should be very low, but the network would be failed to tackle the multi-model inconsistency problem. Is there also exists a when to stop training problem in the IRT process?

We can use it or not by setting the parameter "IRT_initialization". If you want to the main mode be consistent with a specific frame, then you can use this strategy.

If main and minor model are both very close to the processed frames, then it should be similar with unimodal-inconsistency instead of multimodal inconsistency.

There is still a 'when to stop training' problem because there is still slight inconsistency within main mode or minor mode respectively.

got it. thx