ltkong218/FastFlowNet

problems of optical flow results when finetuning on real scene?

poincarelee opened this issue · 5 comments

Hi,
have you tried training on real scene such as market or subway? I have finetuned the model according to IRR-PWC by using your './checkpoints/fastflownet_ft_mix.pth' in subway real scene, but the results are much worse than flownet2's.
And I met another weird problem: during predicting, whether I multiply the optical flow result by div_flow(20), there seems no difference on the flow-png(flow result transferred to png).
flownet2_1857
fastflow_train_1857_347

Do you use ground truth flow label of your real scene to train FastFlowNet in a supervised manner or in an unsupervised way? For optical flow visualization, the scale factor will be normalized in current code, so there is no difference whether you multiply the optical flow result by div_flow(20), you can modify the code to meet you need.

hi,
I use flownet2's predict results as groundtruth, since in real scene groundtruth for optical flow couldn't be obtained.
As for div_flow, yes, you are right.
Any other tricks I missed during training fastflownet ? I completely refer to IRR-PWCNet.
Have you already applied fastflownet to real scene pictures? I think fastflownet's performance would be much better than results I got.

I think taking FlowNet2's prediction as ground truth will lead to error accumulation, I suggest you try to adopt RAFT's prediction as ground truth label. For training FastFlowNet, you should normalize input images to [0, 1] and reduce the one channel mean value, also the flow ground truth should be divided by div_flow(20). Data augmentation like geometry and color augmentation should be adopted for better generalization.

It is good to pretrain FastFlowNet in a self-supervised manner which does not have domain gap, I will release the training code if my under reviewed paper is published.

Ok, thanks for your timely reply.

  1. good advice, I will try RAFT's prediction ASAP.
  2. flow ground truth divided by 20 has already been taken into the code;
    I will try the normalization. 'reduce the one channel mean value' means minus rgb channels mean value? I don't understand this.
    geometry and color augmentation has been adopted.

3)haven't tried self-supervised manner in real-scene dataset, I will try this.

Could you please exchange the details of training FastFlowNet?

  1. The normalization operation involved in demo.py has been used.

  2. Gt has also been divided by 20.

  3. Training the learning rate in half at a fixed stage

The problem we are facing now is that we used CHAIRS for training and MPI-SIntel as the verification set, and found that EPE has always been 13.5px, and the loss is very small, only 0.3.

I checked the gradient, did the return, and also tried to adjust the learning rate, found no effect. @poincarelee