Multimodal Machine Learning for Music (MML4Music)

This repo contains a curated list of academic papers, datasets and other resources on multimodal machine learning (MML) research applied to music. By Ilaria Manco (i.manco@qmul.ac.uk), Centre for Digital Music, QMUL.

This is not meant to be an exhaustive list, as MML for music is a varied and growing field, tackling a wide variety of tasks, from music information retrieval to generation, through many different methods. Since this research area is also not yet well established, conventions and definitions aren't set in stone and this list aims to provide a point of reference for its ongoing development.

Academic Papers
- Survey Papers
- Journal and Conference Papers
Datasets
Workshops, Tutorials & Talks
Other Projects
Statistics & Visualisations
How to Contribute
Other Resources

Papers

Survey Papers

Multimodal music information processing and retrieval: Survey and future challenges (F. Simonetta et al., 2019)
Cross-Modal Music Retrieval and Applications: An Overview of Key Methodologies (M. Muller et al., 2019)

Journal and Conference Papers

Summary of papers on multimodal machine learning for music, including the review papers highlighted above.

Audio-Text

Year	Paper Title	Code
2022	Interpreting Song Lyrics with an Audio-Informed Pre-trained Language Model
2022	Conversational Music Retrieval with Synthetic Data
2022	Contrastive audio-language learning for music	GitHub
2022	Learning music audio representations via weak language supervision	GitHub
2022	Mulan: A joint embedding of music audio and natural language
2022	RECAP: Retrieval Augmented Music Captioner
2022	Data-Efficient Playlist Captioning With Musical and Linguistic Knowledge	GitHub
2022	Clap: Learning audio concepts from natural language supervision	GitHub
2022	Toward Universal Text-to-Music Retrieval	GitHub
2021	MusCaps: Generating Captions for Music Audio	GitHub
2021	Music Playlist Title Generation: A Machine-Translation Approach	GitHub
2020	MusicBERT - learning multi-modal representations for music and text
2020	Music autotagging as captioning
2019	Deep cross-modal correlation learning for audio and lyrics in music retrieval
2018	Music mood detection based on audio and lyrics with deep neural net
2016	Exploring customer reviews for music genre classification and evolutionary studies
2016	Towards Music Captioning: Generating Music Playlist Descriptions
2008	Multimodal Music Mood Classification using Audio and Lyrics

Audio-Image

Year	Paper Title	Code
2020	Tr$\backslash$" aumerai: Dreaming music with stylegan	GitHub
2019	Learning Affective Correspondence between Music and Image
2018	The Sound of Pixels	GitHub
2018	Image generation associated with music data

Audio-Video

Year	Paper Title	Code
2022	It's Time for Artistic Correspondence in Music and Video
2019	Audio-visual embedding for cross-modal music video retrieval through supervised deep CCA
2019	Query by Video: Cross-Modal Music Retrieval
2018	Cbvmr: content-based video-music retrieval using soft intra-modal structure constraint	GitHub

Audio-User

Year	Paper Title	Code
2020	Learning Contextual Tag Embeddings for Cross-Modal Alignment of Audio and Tags	GitHub
2017	A deep multimodal approach for cold-start music recommendation	GitHub

Other

Year	Paper Title	Code
2021	Multimodal metric learning for tag-based music retrieval	GitHub
2021	Enriched music representations with multiple cross-modal contrastive learning	GitHub
2020	Large-Scale Weakly-Supervised Content Embeddings for Music Recommendation and Tagging
2020	Music gesture for visual sound separation
2020	Foley music: Learning to generate music from videos
2020	Musical word embedding: Bridging the gap between listening contexts and music
2019	Query-by-Blending: a Music Exploration System Blending Latent Vector Representations of Lyric Word, Song Audio, and Artist
2019	Multimodal music information processing and retrieval: Survey and future challenges
2019	Cross-Modal Music Retrieval and Applications: An Overview of Key Methodologies
2019	Creating a Multitrack Classical Music Performance Dataset for Multimodal Music Analysis: Challenges, Insights, and Applications
2018	Multimodal Deep Learning for Music Genre Classification	GitHub
2018	JTAV: Jointly Learning Social Media Content Representation by Fusing Textual, Acoustic, and Visual Features	GitHub
2017	Learning neural audio embeddings for grounding semantics in auditory perception
2017	Music emotion recognition via end-To-end multimodal neural networks
2013	Cross-modal Sound Mapping Using Deep Learning
2013	Music emotion recognition: From content- to context-based models
2011	Musiclef: A benchmark activity in multimodal music information retrieval
2011	The need for music information retrieval with user-centered and multimodal strategies
2009	Combining audio content and social context for semantic music discovery

Datasets

Dataset	Description	Modalities	Size
MARD	Multimodal album reviews dataset	Text, Metadata, Audio descriptors	65,566 albums and 263,525 reviews
URMP	Multi-instrument musical pieces of recorded performances	MIDI, Audio, Video	44 pieces (12.5GB)
IMAC	Affective correspondences between images and music	Images, Audio	85,000 images and 3,812 songs
EmoMV	Affective Music-Video Correspondence	Audio, Video	5986 pairs

Workshops, Tutorials & Talks

First Workshop on NLP for Music and Audio

Other Projects

Song Describer: a Platform for Collecting Textual Descriptions of Music Recordings - [link] | [paper] | [code]

Statistics & Visualisations

47 papers referenced. See the details in multimodal_ml_music.bib. Number of articles per year:
If you are applying multimodal ML to music, there are 150 other researchers in your field.
13 tasks investigated. See the list of tasks. Tasks pie chart:
Only 16 articles (34%) provide their source code. by Yann Bayle has a very useful list of resources on reproducibility for MIR and ML.

How To Contribute

Contributions are welcome! Please refer to the contributing.md file.

License

You are free to copy, modify, and distribute Multimodal Machine Learning for Music (MML4Music) with attribution under the terms of the MIT license. See the LICENSE file for details. This project is heavily based on Deep Learning for Music by Yann Bayle and uses other projects. You may refer to them for appropriate license information:

If you use the information contained in this repository, please let us know!

ilaria-manco/multimodal-ml-music