/progressive_cnn_filter

RDO-free approach to coordinate the CNNbased in-loop filters to work seamlessly with video encoders

Primary LanguageC

A Frame-level CNN-based In-Loop Filter For Inter Frame Coding

Dandan Ding, Lingyi Kong, Wenyu Wang, Fengqing Zhu


Abstract

Various Convolutional Neural Network (CNN) structures have been designed for in-loop filtering in video coding which showed performance improvement. These CNN models are usually trained through learning the correlations between the reconstructed and the original frames, which are then applied to every single reconstructed frame to improve the overall video quality. Such a direct model training and deployment strategy is effective for intra coding but will obtain only a locally optimal model. This triggers an over-filtering problem in inter coding because the intertwined reference dependencies across inter frames are not taken into account. To address this issue, state-of-the-art methods usually resort to the Rate-Distortion Optimization (RDO) so that the CNN model only applies to selective coding blocks or frames. However, such schemes cannot fundamentally solve the problem because the direct CNN model is inaccurate.In this paper, we propose a new approach to train and coordinate CNN-based in-loop filters to work seamlessly with video encoders.

Examples showing the subjective quality of frames suffering from the over-fifiltering problem.The fifirst line shows the uncompressed frames and the second illustrates the over-fifiltered frames.


Requirements

  • python 3.7
  • tensorflow >=1.6.0 && <=2.0.0
  • Visual studio >=2013
  • HM 16.9
  • AOM 1.0.0

Usage

Trainning

  • Trainning dataset
    • We use DIV2K dataset for CNN training. Each frame of DIV2K is encoded using H.265/HEVC reference software HM16.9 to obtain the raw reconstructed frames. At the encoder, we use the default AI configuration “encoder intra main.cfg” of HM16.9, except that the traditional in-loop filters including Deblocking and SAO are turned off.
  • Training settings.
    • Frames are segmented into 64×64 patches as samples and the batch size is set to 64. We adopt the Adaptive moment estimation (Adam) algorithm for stochastic optimization. To train the direct model CNN0, the initial learning rate is set as 10^-4 . For the transfer learning phase of models CNNi (0 < i ≤ N), the initiallearning rate is set as 10^-5 . During training, the learning rate is adjusted using the step strategy with γ = 0.5.
    • Loss function:
    • When a model is obtained, the image enhanced with this model is added to the training set to continue training, and Repeat again and again ...

Testing

  • Testing settings
    • We use 18 test sequences which are mostly selected by the Joint Collaborative Team on Video Coding (JCT-VC) to evaluate the video coding efficiency.The first 50 frames of each sequence are used for evaluation. In H.265/HEVC, we follow the default LDP configuration “encoder lowdelay main.cfg” and RA configuration “encoder randomaccess main.cfg” with Deblocking and SAO off.

Model

                         _____     _____           _____
           raw          | 3x3 |   | 3x3 |         | 3x3 |    ___
       reconstructed--->| 64  |-->| 64  |-->...-->| 64  |-->|add|-->filtered
           frame     |  |_____|   |_____|         |_____|    —^—
                     |______________shotcut___________________|

Experiments and Results

1、Overall performance

Convergence of CNNN. First, we need to find out when the progressive CNN model converges. Experiments are conducted to determine the value of N, the number of times to tune the direct CNN model, so as to terminate the progressive training. Here the direct model is termed as CNN0 for clarification.

We have the following observations.

  • LDP requires a higher N than RA.
  • N increases as QP value increases
  • Certain frames are insensitive to the over-filtering effect.

Table 1: Average bitrate (kbps) and psnr (db) of using different cnnn in inter in-loop filtering

Model LDP RA
QP=37 QP=32 QP=27 QP=22 QP=37 QP=32 QP=27 QP=22
bitrate PSNR bitrate PSNR bitrate PSNR bitrate PSNR bitrate PSNR bitrate PSNR bitrate PSNR bitrate PSNR
anchor 923.794 31.806 1860.973 34.495 4246.250 37.278 12540.883 40.295 951.230 32.317 1823.547 34.900 3816.174 37.522 9841.134 40.190
CNN0 991.619 31.462 1933.679 34.264 4268.964 37.197 12388.495 40.346 901.349 32.152 1708.895 34.784 3554.656 37.495 9176.313 40.246
CNN1 927.707 32.098 1840.493 34.765 4162.843 37.511 12442.042 40.496 945.974 32.731 1792.072 35.249 3743.491 37.794 9706.375 40.385
CNN2 921.478 32.148 1843.724 34.763 4162.424 37.519 12459.105 40.494 942.131 32.763 1792.041 35.256 3744.338 37.799 9701.845 40.385
CNN5 921.106 32.162 1833.963 34.798 4160.810 37.535 12480.928 40.490 941.990 32.773 1791.046 35.262 3745.463 37.802 9711.090 40.379
CNN8 920.610 32.171 4161.774 37.518 942.081 32.774

Table 2: CNN models used in H.265/HEVC experiments. Here CNN0 represents the direct model

QP values LDP RA
37 CNN8 CNN5
32 CNN5 CNN5
27 CNN5 CNN2
22 CNN0/CNN2 CNN0/CNN2

2、Comparison with existing methods

We compare our approach to existing solutions including the RDO-based method and the skipping method.

Note: the two methods are applied only to inter frames. The intra frames are all filtered by the direct CNN model

Our proposed approach achieves the best performance for all configurations as shown in Table 3.

In LDP configuration, the frame skipping, CU skipping, and CTU-RDO method achieve 7.42%, 1.95%, and 9.00% BD-rate saving,respectively, whereas our approach gains as much as 9.62%. In RA, the above three methods obtain 6.09%, 7.02%, and 9.27% BD-rate reduction, which is lower than ours at 10.12%.

Table 3: BD-rate of our proposed approach compared with previous solutions

Class Sequence AI LDP RA
AI Direct use frame skipping CU skipping CTU_RDO propsed Direct use fram skipping CU skipping CTU_RDO propsed
A PeopleOnStreet -9.55% +0.98% -3.88% -2.74% -5.24% -6.50% -4.51% -3.24% -6.53% -7.21% -8.14%
Traffic -10.80% +21.65% -7.52% +5.99% -7.47% -8.90% +6.34% -7.38% -6.37% -9.54% -10.99%
B BasketballDrive -8.58% -4.04% -5.83% -5.80% -9.78% -9.82% -3.83% -3.36% -6.35% -8.13% -8.70%
BQTerrace -5.72% +7.44% -9.86% -5.79% -12.27% -10.77% 0.43% -8.65% -9.30% -12.27% -11.67%
Cactus -7.76% +13.51% -7.17% +2.49% -8.72% -8.46% 1.22% -6.07% -6.45% -10.14% -9.80%
Kimono -8.40% -2.19% -4.23% -3.84% -5.82% -6.43% -4.64% -2.16% -5.41% -5.74% -6.08%
ParkScene -8.32% +11.08% -3.33% +2.77% -3.62% -4.67% 3.49% -3.55% -4.46% -6.28% -7.28%
C BQMall -10.41% +2.27% -7.63% -4.62% -9.37% -10.48% -1.09% -6.27% -7.51% -9.65% -10.69%
PartyScene -6.44% +2.49% -3.82% -1.12% -5.88% -5.96% +1.11% -3.05% -3.74% -5.98% -6.40%
BasketballDrill -15.78% +10.47% -9.02% +2.22% -11.45% -11.75% +2.44% -7.62% -4.75% -11.17% -11.96%
RaceHorsesC -6.14% -1.61% -2.94% -2.71% -4.98% -4.90% -4.28% -2.18% -4.86% -5.80% -5.84%
D BasketballPass -11.54% +2.92% -7.41% -7.37% -9.46% -10.69% -1.05% -5.76% -8.00% -8.94% -10.24%
BlowingBubbles -8.54% +3.13% -4.49% -0.56% -5.67% -6.07% +0.24% -3.53% -4.14% -6.36% -7.39%
BQSquare -8.43% +0.39% -8.38% -5.85% -9.66% -11.09% +1.02% -6.65% -6.57% -8.43% -9.59%
RaceHorses -10.70% -3.56% -4.46% -5.18% -7.02% -7.98% -6.43% -3.13% -7.21% -7.87% -8.45%
E Johnny -13.61% +28.42% -16.13% +2.16% -16.68% -17.95% +8.45% -12.59% -10.85% -14.64% -16.88%
FourPeople -13.95% +28.52% -14.02% -2.56% -14.79% -15.47% +6.38% -12.85% -12.23% -14.96% -16.38%
KristenAndSara -13.07% +28.10% -13.47% -2.61% -14.14% -15.20% +7.49% -11.51% -11.70% -13.76% -15.64%
Class A -10.18% +11.32% -5.70% +1.63% -6.36% -7.70% +0.92% -5.31% -6.45% -8.37% -9.56%
Class B -7.75% +5.16% -6.09% -2.03% -8.04% -8.03% -0.66% -4.76% -6.39% -8.51% -8.71%
Class C -9.69% +3.40% -5.85% -1.56% -7.92% -8.27% -0.46% -4.78% -5.21% -8.15% -8.72%
Class D -9.80% +0.72% -6.19% -4.74% -7.95% -8.96% -1.56% -4.77% -6.48% -7.90% -8.92%
Class E -13.54% +28.35% -14.54% -1.00% -15.21% -16.21% +7.44% -12.32% -11.59% -14.45% -16.30%
Average -9.87% +8.33% -7.42% -1.95% -9.00% -9.62% +0.71% -6.09% -7.02% -9.27% -10.12%

3、Compared with the direct model

An accurate model is crucial for solving the over-filtering problem. However, the RDO-based and the skipping methods both adopt the inaccurate direct CNN model which is trained without considering the complex reference correlations across inter frames. To this end, the coding efficiency of the two methods can be further improved if a more accurate model is used.

  • Integrate our progressive model to the RDO-based and the skipping methods.

Table 4: BD-rate (%) of using the progressive model instead of the direct model in the RDO-based and skipping methods

Class Sequence LDP RA
frame skipping CTU-RDO CU skipping CTU-RDO
A PeopleOnStreet -4.21% -7.10% -7.63% -8.16%
Traffic -8.81% -9.27% -10.51% -10.71%
B BasketballDrive -5.73% -10.77% -7.56% -8.40%
BQTerrace -9.38% -12.73% -10.46% -11.91%
Cactus -8.42% -10.13% -9.23% -10.30%
Kimono -4.72% -7.34% -5.75% -5.93%
ParkScene -4.03% -5.23% -6.84% -6.99%
C BQMall -8.18% -11.00% -9.98% -10.42%
PartyScene -3.91% -6.46% -5.79% -6.28%
BasketballDrill -9.76% -12.60% -11.00% -11.93%
RaceHorsesC -2.84% -5.43% -5.22% -5.85%
D BasketballPass -8.03% -11.15% -9.08% -10.02%
BlowingBubbles -4.19% -6.93% -6.63% -7.22%
BQSquare -7.98% -11.23% -8.46% -9.38%
RaceHorses -4.62% -8.77% -7.64% -8.45%
E Johnny -17.75% -17.86% -16.00% -15.36%
FourPeople -16.08% 15.49% -15.68% -15.37%
KristenAndSara 15.86% -15.19% -14.90% -14.42%
Average (progressive model) -8.03% -10.26% -9.35% -9.84%
Average (direct model) -7.42% -9.00% -7.02% -9.27%

4、Generalizability of our approach

  • Deploy our proposed scheme on different networks. In addition to VDCNN 23, the proposed approach is also implemented using existing networks for verification. Two networks, DSCNN and SEFCNN, are trained using the progressive method and the frame-level RDO is conducted for model selection.

    From the results in Fig. 2 we can see that the direct model leads to over-filtering and the results are even worse than the H.265/HEVC anchor. The coding efficiency is improved with CTU-RDO.Furthermore, our proposed approach achieves comparable PSNR to that of CTU-RDO while the bitrate cost is reduced significantly

    Figure 2: The proposed approach is implemented using DSCNN and SEFCNN for comparison. In this example, the obtained progressive models are used to test the performance of inter frames in RA confifiguration at QP = 37.

  • Deploy our proposed scheme on different configurations. In addition, we apply our proposed scheme to another LDP configuration “IPPPIPPP”.

    From Table 6 we can see that the skipping method achieves +0.35 dB PSNR improvement and -0.36% bitrate reduction over the anchor H.265/HEVC encoder. Using CTU-RDO, the bitrate reduction is slightly decreased and the PSNR performance is further boosted by +0.20 dB over the skipping method.In our proposed progressive scheme, the PSNR gain is the same as that of CTU-RDO whereas the bitrate is decreased by -0.67%.

    Table 6: Performance under IPPPIPPP coding in LDP confifiguration

    Method Anchor frame skipping CTU-RDO Proposed
    Bitrate PSNR ∆Bitrate ∆PSNR ∆Bitrate ∆PSNR ∆Bitrate ∆PSNR
    Class A 6579.51 32.79 +0.06% +0.31 -10.07% +0.52 -0.31% +0.52
    Class B 3672.09 33.16 -0.51% +0.22 -0.44% +0.38 -0.76% +0.36
    Class C 1786.30 30.16 -0.57% +0.36 -0.60% +0.52 -0.77% +0.52
    Class D 509.57 29.88 -0.39% +0.44 -0.47% +0.58 -0.76% +0.59
    Class E 1448.55 36.22 -0.64% +0.48 -0.65% +0.87 -1.19% +0.88
    Average 2502.70 32.23 -0.36% +0.35 -0.38% +0.55 -0.67% +0.55

5、Visual quality.

Examples of filtered frames, such as the 13th frames of sequence “BQmall” and the 22th frame of sequence “FourPeople”, processe by the traditional in-loop filter, the direct CNN model, and our proposed progressive model.

our progressive model successfully removes artifacts and retains some details. The results look visually more appealing.

Figure 3: Visual quality comparison of difffferent in-loop fifiltering schemes for QP = 37.