Attribute-guided cross-modal interaction and enhancement for audio-visual matching
Looking and Hearing into Details: Dual-enhanced Siamese Adversarial Network for Audio-Visual Matching