Dandan Ding, Lingyi Kong, Wenyu Wang, Fengqing Zhu
Various Convolutional Neural Network (CNN) structures have been designed for in-loop filtering in video coding which showed performance improvement. These CNN models are usually trained through learning the correlations between the reconstructed and the original frames, which are then applied to every single reconstructed frame to improve the overall video quality. Such a direct model training and deployment strategy is effective for intra coding but will obtain only a locally optimal model. This triggers an over-filtering problem in inter coding because the intertwined reference dependencies across inter frames are not taken into account. To address this issue, state-of-the-art methods usually resort to the Rate-Distortion Optimization (RDO) so that the CNN model only applies to selective coding blocks or frames. However, such schemes cannot fundamentally solve the problem because the direct CNN model is inaccurate.In this paper, we propose a new approach to train and coordinate CNN-based in-loop filters to work seamlessly with video encoders.
Examples showing the subjective quality of frames suffering from the over-fifiltering problem.The fifirst line shows the uncompressed frames and the second illustrates the over-fifiltered frames.
- python 3.7
- tensorflow >=1.6.0 && <=2.0.0
- Visual studio >=2013
- HM 16.9
- AOM 1.0.0
- Trainning dataset
- We use DIV2K dataset for CNN training. Each frame of DIV2K is encoded using H.265/HEVC reference software HM16.9 to obtain the raw reconstructed frames. At the encoder, we use the default AI configuration “encoder intra main.cfg” of HM16.9, except that the traditional in-loop filters including Deblocking and SAO are turned off.
- Training settings.
- Frames are segmented into 64×64 patches as samples and the batch size is set to 64. We adopt the Adaptive moment estimation (Adam) algorithm for stochastic optimization. To train the direct model CNN0, the initial learning rate is set as 10^-4 . For the transfer learning phase of models CNNi (0 < i ≤ N), the initiallearning rate is set as 10^-5 . During training, the learning rate is adjusted using the step strategy with γ = 0.5.
- Loss function:
- When a model is obtained, the image enhanced with this model is added to the training set to continue training, and Repeat again and again ...
- Testing settings
- We use 18 test sequences which are mostly selected by the Joint Collaborative Team on Video Coding (JCT-VC) to evaluate the video coding efficiency.The first 50 frames of each sequence are used for evaluation. In H.265/HEVC, we follow the default LDP configuration “encoder lowdelay main.cfg” and RA configuration “encoder randomaccess main.cfg” with Deblocking and SAO off.
_____ _____ _____
raw | 3x3 | | 3x3 | | 3x3 | ___
reconstructed--->| 64 |-->| 64 |-->...-->| 64 |-->|add|-->filtered
frame | |_____| |_____| |_____| —^—
|______________shotcut___________________|
Convergence of CNNN. First, we need to find out when the progressive CNN model converges. Experiments are conducted to determine the value of N, the number of times to tune the direct CNN model, so as to terminate the progressive training. Here the direct model is termed as CNN0 for clarification.
We have the following observations.
- LDP requires a higher N than RA.
- N increases as QP value increases
- Certain frames are insensitive to the over-filtering effect.
Model | LDP | RA | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
QP=37 | QP=32 | QP=27 | QP=22 | QP=37 | QP=32 | QP=27 | QP=22 | |||||||||
bitrate | PSNR | bitrate | PSNR | bitrate | PSNR | bitrate | PSNR | bitrate | PSNR | bitrate | PSNR | bitrate | PSNR | bitrate | PSNR | |
anchor | 923.794 | 31.806 | 1860.973 | 34.495 | 4246.250 | 37.278 | 12540.883 | 40.295 | 951.230 | 32.317 | 1823.547 | 34.900 | 3816.174 | 37.522 | 9841.134 | 40.190 |
CNN0 | 991.619 | 31.462 | 1933.679 | 34.264 | 4268.964 | 37.197 | 12388.495 | 40.346 | 901.349 | 32.152 | 1708.895 | 34.784 | 3554.656 | 37.495 | 9176.313 | 40.246 |
CNN1 | 927.707 | 32.098 | 1840.493 | 34.765 | 4162.843 | 37.511 | 12442.042 | 40.496 | 945.974 | 32.731 | 1792.072 | 35.249 | 3743.491 | 37.794 | 9706.375 | 40.385 |
CNN2 | 921.478 | 32.148 | 1843.724 | 34.763 | 4162.424 | 37.519 | 12459.105 | 40.494 | 942.131 | 32.763 | 1792.041 | 35.256 | 3744.338 | 37.799 | 9701.845 | 40.385 |
CNN5 | 921.106 | 32.162 | 1833.963 | 34.798 | 4160.810 | 37.535 | 12480.928 | 40.490 | 941.990 | 32.773 | 1791.046 | 35.262 | 3745.463 | 37.802 | 9711.090 | 40.379 |
CNN8 | 920.610 | 32.171 | – | – | 4161.774 | 37.518 | – | – | 942.081 | 32.774 | – | – | – | – | – | – |
QP values | LDP | RA |
---|---|---|
37 | CNN8 | CNN5 |
32 | CNN5 | CNN5 |
27 | CNN5 | CNN2 |
22 | CNN0/CNN2 | CNN0/CNN2 |
We compare our approach to existing solutions including the RDO-based method and the skipping method.
Note: the two methods are applied only to inter frames. The intra frames are all filtered by the direct CNN model
Our proposed approach achieves the best performance for all configurations as shown in Table 3.
In LDP configuration, the frame skipping, CU skipping, and CTU-RDO method achieve 7.42%, 1.95%, and 9.00% BD-rate saving,respectively, whereas our approach gains as much as 9.62%. In RA, the above three methods obtain 6.09%, 7.02%, and 9.27% BD-rate reduction, which is lower than ours at 10.12%.
Class | Sequence | AI | LDP | RA | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
AI | Direct use | frame skipping | CU skipping | CTU_RDO | propsed | Direct use | fram skipping | CU skipping | CTU_RDO | propsed | ||
A | PeopleOnStreet | -9.55% | +0.98% | -3.88% | -2.74% | -5.24% | -6.50% | -4.51% | -3.24% | -6.53% | -7.21% | -8.14% |
Traffic | -10.80% | +21.65% | -7.52% | +5.99% | -7.47% | -8.90% | +6.34% | -7.38% | -6.37% | -9.54% | -10.99% | |
B | BasketballDrive | -8.58% | -4.04% | -5.83% | -5.80% | -9.78% | -9.82% | -3.83% | -3.36% | -6.35% | -8.13% | -8.70% |
BQTerrace | -5.72% | +7.44% | -9.86% | -5.79% | -12.27% | -10.77% | 0.43% | -8.65% | -9.30% | -12.27% | -11.67% | |
Cactus | -7.76% | +13.51% | -7.17% | +2.49% | -8.72% | -8.46% | 1.22% | -6.07% | -6.45% | -10.14% | -9.80% | |
Kimono | -8.40% | -2.19% | -4.23% | -3.84% | -5.82% | -6.43% | -4.64% | -2.16% | -5.41% | -5.74% | -6.08% | |
ParkScene | -8.32% | +11.08% | -3.33% | +2.77% | -3.62% | -4.67% | 3.49% | -3.55% | -4.46% | -6.28% | -7.28% | |
C | BQMall | -10.41% | +2.27% | -7.63% | -4.62% | -9.37% | -10.48% | -1.09% | -6.27% | -7.51% | -9.65% | -10.69% |
PartyScene | -6.44% | +2.49% | -3.82% | -1.12% | -5.88% | -5.96% | +1.11% | -3.05% | -3.74% | -5.98% | -6.40% | |
BasketballDrill | -15.78% | +10.47% | -9.02% | +2.22% | -11.45% | -11.75% | +2.44% | -7.62% | -4.75% | -11.17% | -11.96% | |
RaceHorsesC | -6.14% | -1.61% | -2.94% | -2.71% | -4.98% | -4.90% | -4.28% | -2.18% | -4.86% | -5.80% | -5.84% | |
D | BasketballPass | -11.54% | +2.92% | -7.41% | -7.37% | -9.46% | -10.69% | -1.05% | -5.76% | -8.00% | -8.94% | -10.24% |
BlowingBubbles | -8.54% | +3.13% | -4.49% | -0.56% | -5.67% | -6.07% | +0.24% | -3.53% | -4.14% | -6.36% | -7.39% | |
BQSquare | -8.43% | +0.39% | -8.38% | -5.85% | -9.66% | -11.09% | +1.02% | -6.65% | -6.57% | -8.43% | -9.59% | |
RaceHorses | -10.70% | -3.56% | -4.46% | -5.18% | -7.02% | -7.98% | -6.43% | -3.13% | -7.21% | -7.87% | -8.45% | |
E | Johnny | -13.61% | +28.42% | -16.13% | +2.16% | -16.68% | -17.95% | +8.45% | -12.59% | -10.85% | -14.64% | -16.88% |
FourPeople | -13.95% | +28.52% | -14.02% | -2.56% | -14.79% | -15.47% | +6.38% | -12.85% | -12.23% | -14.96% | -16.38% | |
KristenAndSara | -13.07% | +28.10% | -13.47% | -2.61% | -14.14% | -15.20% | +7.49% | -11.51% | -11.70% | -13.76% | -15.64% | |
Class A | -10.18% | +11.32% | -5.70% | +1.63% | -6.36% | -7.70% | +0.92% | -5.31% | -6.45% | -8.37% | -9.56% | |
Class B | -7.75% | +5.16% | -6.09% | -2.03% | -8.04% | -8.03% | -0.66% | -4.76% | -6.39% | -8.51% | -8.71% | |
Class C | -9.69% | +3.40% | -5.85% | -1.56% | -7.92% | -8.27% | -0.46% | -4.78% | -5.21% | -8.15% | -8.72% | |
Class D | -9.80% | +0.72% | -6.19% | -4.74% | -7.95% | -8.96% | -1.56% | -4.77% | -6.48% | -7.90% | -8.92% | |
Class E | -13.54% | +28.35% | -14.54% | -1.00% | -15.21% | -16.21% | +7.44% | -12.32% | -11.59% | -14.45% | -16.30% | |
Average | -9.87% | +8.33% | -7.42% | -1.95% | -9.00% | -9.62% | +0.71% | -6.09% | -7.02% | -9.27% | -10.12% |
An accurate model is crucial for solving the over-filtering problem. However, the RDO-based and the skipping methods both adopt the inaccurate direct CNN model which is trained without considering the complex reference correlations across inter frames. To this end, the coding efficiency of the two methods can be further improved if a more accurate model is used.
- Integrate our progressive model to the RDO-based and the skipping methods.
Table 4: BD-rate (%) of using the progressive model instead of the direct model in the RDO-based and skipping methods
Class | Sequence | LDP | RA | ||
---|---|---|---|---|---|
frame skipping | CTU-RDO | CU skipping | CTU-RDO | ||
A | PeopleOnStreet | -4.21% | -7.10% | -7.63% | -8.16% |
Traffic | -8.81% | -9.27% | -10.51% | -10.71% | |
B | BasketballDrive | -5.73% | -10.77% | -7.56% | -8.40% |
BQTerrace | -9.38% | -12.73% | -10.46% | -11.91% | |
Cactus | -8.42% | -10.13% | -9.23% | -10.30% | |
Kimono | -4.72% | -7.34% | -5.75% | -5.93% | |
ParkScene | -4.03% | -5.23% | -6.84% | -6.99% | |
C | BQMall | -8.18% | -11.00% | -9.98% | -10.42% |
PartyScene | -3.91% | -6.46% | -5.79% | -6.28% | |
BasketballDrill | -9.76% | -12.60% | -11.00% | -11.93% | |
RaceHorsesC | -2.84% | -5.43% | -5.22% | -5.85% | |
D | BasketballPass | -8.03% | -11.15% | -9.08% | -10.02% |
BlowingBubbles | -4.19% | -6.93% | -6.63% | -7.22% | |
BQSquare | -7.98% | -11.23% | -8.46% | -9.38% | |
RaceHorses | -4.62% | -8.77% | -7.64% | -8.45% | |
E | Johnny | -17.75% | -17.86% | -16.00% | -15.36% |
FourPeople | -16.08% | 15.49% | -15.68% | -15.37% | |
KristenAndSara | 15.86% | -15.19% | -14.90% | -14.42% | |
Average (progressive model) | -8.03% | -10.26% | -9.35% | -9.84% | |
Average (direct model) | -7.42% | -9.00% | -7.02% | -9.27% |
-
Deploy our proposed scheme on different networks. In addition to VDCNN 23, the proposed approach is also implemented using existing networks for verification. Two networks, DSCNN and SEFCNN, are trained using the progressive method and the frame-level RDO is conducted for model selection.
From the results in Fig. 2 we can see that the direct model leads to over-filtering and the results are even worse than the H.265/HEVC anchor. The coding efficiency is improved with CTU-RDO.Furthermore, our proposed approach achieves comparable PSNR to that of CTU-RDO while the bitrate cost is reduced significantly
Figure 2: The proposed approach is implemented using DSCNN and SEFCNN for comparison. In this example, the obtained progressive models are used to test the performance of inter frames in RA confifiguration at QP = 37.
-
Deploy our proposed scheme on different configurations. In addition, we apply our proposed scheme to another LDP configuration “IPPPIPPP”.
From Table 6 we can see that the skipping method achieves +0.35 dB PSNR improvement and -0.36% bitrate reduction over the anchor H.265/HEVC encoder. Using CTU-RDO, the bitrate reduction is slightly decreased and the PSNR performance is further boosted by +0.20 dB over the skipping method.In our proposed progressive scheme, the PSNR gain is the same as that of CTU-RDO whereas the bitrate is decreased by -0.67%.
Method Anchor frame skipping CTU-RDO Proposed Bitrate PSNR ∆Bitrate ∆PSNR ∆Bitrate ∆PSNR ∆Bitrate ∆PSNR Class A 6579.51 32.79 +0.06% +0.31 -10.07% +0.52 -0.31% +0.52 Class B 3672.09 33.16 -0.51% +0.22 -0.44% +0.38 -0.76% +0.36 Class C 1786.30 30.16 -0.57% +0.36 -0.60% +0.52 -0.77% +0.52 Class D 509.57 29.88 -0.39% +0.44 -0.47% +0.58 -0.76% +0.59 Class E 1448.55 36.22 -0.64% +0.48 -0.65% +0.87 -1.19% +0.88 Average 2502.70 32.23 -0.36% +0.35 -0.38% +0.55 -0.67% +0.55
Examples of filtered frames, such as the 13th frames of sequence “BQmall” and the 22th frame of sequence “FourPeople”, processe by the traditional in-loop filter, the direct CNN model, and our proposed progressive model.
our progressive model successfully removes artifacts and retains some details. The results look visually more appealing.
Figure 3: Visual quality comparison of difffferent in-loop fifiltering schemes for QP = 37.