timwedde/py_midicsv

Removing track from midi file

Closed this issue · 2 comments

Hi timwedde. I'm trying to build a dataset of midi-files for use in a machine learning project. I've been using py_midicsv to try and remove the drum track from my files. I've tried looking for "Program_c, 9" in the file after conversion to CSV, deleting that whole section and converting back to MIDI, but something is breaking. Could you maybe suggest a better way to do this?
Thanks for the cool module!

Hey, thanks for trying out this project!
Just as a quick heads-up: This is still a beta, so this might not work with all MIDI files in existence.

Nevertheless, here's an example of how to filter out a specific channel:
Code example has changed, please see the next post below this for an up-to-date example!

If you're doing this for machine-learning purposes, you might also be interested in another project of mine that builds off of this package, banana-split: It is capable of splitting up MIDI files into individual channels (and tracks, even multiple per channel, if they exist). That way you can work directly with the extracted channels instead of writing scripts for every channel, similar to the one above.
This is also where you can find RegEx patterns for all the features in a CSV-MIDI file.

Nevertheless, let me know if this works for you. Also let me know if things break so I can fix them, since I'm always looking ti improve this library.

There will also likely be some changes to the output formats (at the moment I'm returning StringIO objects from the MIDI parser, which is kind of unwieldy), so watch out for that if you continue using this package.

I have just updated py_midicsv to 1.8.0-beta, which moves from StringIO to returning lists of strings when parsing into CSV format. This is easier to use, however the way to use it changes the example above a slight bit:

import re
import py_midicsv

# Define the patterns which we want to find in the MIDI file
comment_pattern = re.compile(r'\s*[\#\;]')
channel_pattern = re.compile(r'\s*\d+\s*,\s*\d+\s*,\s*\w+_c\s*,\s*(\d+)')
lyric_pattern = re.compile(r'\s*\d+\s*,\s*\d+\s*,\s*(\w+_t)\s*,\s*"(.*)"')

# Load the MIDI file and parse it into CSV format
csv_string = py_midicsv.midi_to_csv("input.mid")

with open("no_drums.csv", "w") as f:
    for line in csv_string:
        if comment_pattern.match(line): # skip comments
            continue
        m = channel_pattern.match(line)
        if m:
            if int(m.group(1)) != 9: # check that channel is not 9 (drums)
                f.write(line)
        else:
            if not lyric_pattern.match(line):  # skip lyrics
                f.write(line)

# Parse the CSV output of the previous command back into a MIDI file
with open("no_drums.csv", "r") as f:
    midi_object = py_midicsv.csv_to_midi(f.readlines())

# Save the parsed MIDI file to disk
with open("output_no_drums.mid", "wb") as output_file:
    midi_writer = py_midicsv.FileWriter(output_file)
    midi_writer.write(midi_object)

As you can see, you can now simply iterate the returned parsed CSV and the newlines are also already included per line.