microsoft/fadtk

More lines than songs in fma_pop_tracks.csv

PabloPeso opened this issue · 1 comments

Hi,

I had some issues retrieving the tracks from FMA used by FMA-Pop. The main issue is that some of the lines in fma_pop_tracks.csv are linebreaks, this is, what should be a single line is split into 2 or more lines.

For example, in https://github.com/microsoft/fadtk/blob/main/datasets/fma_pop_tracks.csv, the lines 13, 14 and 15 seem to belong to line 12. Is that correct?

I found that almost 600 lines are extra (linebreaks or empty like line 118).

Is this intended?

Thanks,

Yes, this is how the CSV standard escapes special characters such as line breaks.

For example, if a row contains the text Hello\n\nWorld, a proper CSV would not escape the newlines but rather use quotation marks to wrap around them:

image

Please don't load a CSV line-by-line, that's not how they're designed to be read. You can use a proper CSV library or Pandas.