This is the official implementation of EmoMusicTV, which is a transformer-based variational autoencoder (VAE) that contains a hierarchical latent variable structure to explore the impact of time-varying emotional conditions on multiple music generation tasks and to capture the rich variability of musical sequences.
- Paper link
- Check our demo page and listen!
👇Interpretation of index in melody.data
Index | Definition |
---|---|
0 | bar mark |
1-61 | pitch (1 for rest, 2-61 for pitch 42-101) |
62-98 | duration (mapping dict shown in chordVAE_eval.py) |
99-106 | time signature (mapping dict shown in chordVAE_eval.py) |
Consequently, each melody event can be represented as a 107-D one-hot vector.
👇Interpretation of index in chord.data
Index | Definition |
---|---|
0-6 | chord mode (0 for rest, mapping dict shown in chordVAE_eval.py) |
0-40 | root tone (40 for rest, 0-39 for pitch 30-69) |
Consequently, each chord event is represented by a 48-D vector (concatenation of 7-D and 41-D).
👇Interpretation of index in valence.data
Index | Definition |
---|---|
-2 | very negative |
-1 | moderate negative |
0 | neutral |
1 | moderate positive |
2 | very positive |
Consequently, each emotional label is represented by a 5-D one-hot vector.
If you find the code useful for your research, please consider citing
@article{ji2023emomusictv,
title={EmoMusicTV: Emotion-conditioned Symbolic Music Generation with Hierarchical Transformer VAE},
author={Ji, Shulei and Yang, Xinyu},
journal={IEEE Transactions on Multimedia},
year={2023},
publisher={IEEE}
}