This is a PyTorch Dataset implementation for 14,000 sound samples of the Philharmonia Orchestra, retrieved from their website
Clone the repo and install using pip (and install ffmpeg)
apt-get install ffmpeg
git clone https://github.com/hugofloresgarcia/philharmonia-dataset
cd philharmonia-dataset && pip install -e .
from philharmonia_dataset import PhilharmoniaSet
# create a dataset object
dataset = PhilharmoniaDataset(root='./data/philharmonia',
download=True,
sample_rate=48000,)
During the first run, calling PhilharmoniaDataset
will download the audio files from here and convert mp3
files to wav
, for faster loading. This will take approximately 5-10 minutes.
sample output
dataset[0]
{
audio (np.ndarray): audio array with shape (channels, samples)
one_hot (np.ndarray): one hot encoding of label
instrument (str): instrument name
articulation (str): playing articulation (e.g 'pizz-normal') for pizzicato
dynamic (str): playing dynamic (e.g. 'forte')
pitch (str): pitch (e.g. 'B5'). If instrument is unpitched, will return 'nan'.
}
each example is assigned one of the following labels:
Index | Label |
---|---|
0 | banjo |
1 | bass-clarinet |
2 | bassoon |
3 | cello |
4 | clarinet |
5 | contrabassoon |
6 | double-bass |
7 | english-horn |
8 | flute |
9 | french-horn |
10 | guitar |
11 | mandolin |
12 | oboe |
13 | saxophone |
14 | trombone |
15 | trumpet |
16 | tuba |
17 | viola |
18 | violin |
for example, the one_hot encoding of clarinet
would be
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
(index number 4 is 1
, while all other indices are 0
)