Figure 1. For each mask ratio, we repeat the experiment 50 times and thus obtain 50 corresponding points. The color of the point is related to the accuracy. The lighter the color, the higher the accuracy.
Figure 2. We conduct the same mask experiments with Figure (1c) in the manuscript. In Figure 2 of this GitHub, the coordinates settings of subfigures (2b) and (2c) are the same as the (2a).
Figure 3. The illustration of basic assumption in multi-modal learning.