MIDI-sources

A list of sites with MIDI files on the Web.

9b99bf7bf2fb0113c1f4bc4e03883022fde35534

MIDI-LD stats

On the reddit dump (full list)

  • 119,502 MIDI files
  • 113,565 nquad files (95.03% over the total, 4.96% failure rate)
  • 4,911,643,092 triples

Wat te doen

  • HAS TO BE MEANINGFUL FOR LINKED DATA PERSONS AND MUSIC PERSONS

  • SEMANTIC WEB TASKS

    • Linking each named graph to the best matching URI in the LOD cloud -- DBpedia, wikidata, musicbrainz/linkedbrainz, etc
      • Using text: LOTUS
      • Using music: shazam/soundhound API
      • Using both? Other?
    • Audiolization: musical interface for finding / accessing data
  • Musician / musicologist TASKS

    • The LD jukebox: Composing by mixing using SPARQL CONSTRUCT -- mashups!
    • Music annotation (i.e. mixing vocabularies)
    • ???
  • Both: searching musical LD by means of musical similarity metrics -- Search the MIDI cloud using musical similarity metrics (a la LOTUS) and list the results

  • Non-functional requirements: easy to implement, minimal UI

Conclusions

  1. The linking paper comes first (because the other papers need it!)
  2. The IR / composer papers come next

How do we do the linking paper? Option 1. Fingerprints with chromashit transforming the MIDI to MP3 Option 2. Match Incipits of https://opac.rism.info/metaopac/singleHit.do?methodToCall=showHit&curPos=1&identifier=251_SOLR_SERVER_1577181594 with Reinier-separated MIDI tracks

Outline

  • We have a random dump of MIDI files we want to link to Web resources
  • We represent those MIDI as Linked Data
  • We look for other Linked Data resources on the Web
  • PROBLEM: MIDI-LD (database A) only contain (symbolic) musical information
  • (Actually that's not true: they mostly contain some metadata in textual form, but we cannot assume it)
    • Actually: 109,725 out of 113,565 contain some label (96.62%)
  • PROBLEM: those other Linked Data resources on the Web (database B) DO NOT contain any musical information
  • PROBLEM (short): there's no overlap between musical info (in database A) and textual info (in database B)
  • WE ARE ARGUING THAT LD DATABASES SHOULD CONTAIN JUST A LITTLE BIT OF SYMBOLIC MUSICAL INFORMATION (identifying the song)
  • NO
  • WE ARE ACTUALLY ARGUING IS THAT THERE MUST BE A THRESHOLD ON HOW MUCH TEXTUAL INFORMATION MUSICAL DATABASES MUST CONTAIN, AND HOW MUCH MUSICAL INFORMATION TEXTUAL MUST DATABASES CONTAIN
  • (E.g. only textual information gets 40% of the songs correctly linked, you need symbolic music info to accurately link the rest)
  • Except for: RISM, who have INCIPITS
  • Incipits are the first 10 notes of the melody of a song
  • We take the MUSEdata database musedata.org

Linking overview

  • Try to distinguish melody track, and then use shazam/soundhound (w/ Rob Macrae)

  • Text from filename + metadata, and then link using NLP/LOTUS

  • Machine Learning, crowdsourcing

  • Enrich DBpedia with the symbolic notes

  • Link lyrics to MIDIs

  • Link to (from) Spotify tracks

Linking to LOD using LOTUS

  • Using file names : find 130000_Pop_Rock_Classical_Videogame_EDM_MIDI_Archive\[6_19_15\] -type f -name "*.mid" -exec basename {} \;
  • Using MIDI textual metadata

DATASET FILES

  • Raw conversion of MIDI to RDF
  • MIDI metadata (pitches, instruments, etc -> MIDI resources that don't belong to the vocab)
  • Links: to lyrics
  • Links: to LOTUS top matches
  • Links: to DBpedia
  • Links: to Spotify
  • Links: from DBpedia to this MIDI track (incipits) --> find encoding in Literals like in RISM

Related work

After VU meeting TODO

More datasets:

  • Mysongbook to MIDI and add to MIDI dump
  • Find name of eHumanities presenters doing catchyness / motif identification

Text part:

  • LOTUS part / retrieval indexing - infrastructure to query dataset
  • Decide later on long tail / NED

Symbolic part:

  • Compile RISM and make it queryable
  • Prototype to match classical music MIDIs with incipits

ML challenge part:

  • Baseline with song names: (midi_song_names x dbpedia+MB_song_names) <- find best matches here (link prediction / entity resolution task)
  • "Identity problem": linking musical pieces to particular subsets of suites / the whole suite?