This project was created by Kevin Chau, Kevin Tai, Rohan Kondetimmanahalli, and Dhruv Verma. A blog post detailing more about the project can be found here: https://medium.com/@dhruvverma/music-transcription-using-a-convolutional-neural-network-b115968829f4
An outline for our project was as follows:
- Find raw audio file as input
- Create a spectrogram from the raw audio file
- Time slice spectrogram image into intervals
- Feed the CNN a slice of the image as input
- Take the output of the CNN and turn it into a MIDI file
- Restitch each of the sliced outputs into one MIDI file
The restitched MIDI file is the transcribed version of the initial raw audio file.
Tools required to run this project: Tensorflow, Keras, fluidsynth, Pillow, mido, pretty-midi