UNDER DEVELOPMENT
Links the LFM-1b dataset consisting of Last FM listening histories to artist gender meta data found in a local config of the MusicBrainz db.
- JDK 1.8
- PSQL (PostgreSQL) 10.10
- Docker / Docker compose
Before cloning the repo the LFM-1b and MusicBrainz dataset must be appropriatly configured.
-
LFM-1b dataset: Download the file:
LFM-1b.zip
and extract the zip file to a temp dir e.g. ~/temp -
Rename LFM-1b_tracks.txt to LFM-1b_tracks.tsv
-
Run the following cmd in psql to insert data from LFM-1b_tracks into a db:
COPY songs FROM '/path_to_file/LFM-1b_tracks.tsv' DELIMITER '\t'
-
The MusicBrainz db dumps should be downloaded. Choose the appropriate mirror and download
mbdump.tar.bz2
andmbdump-derived.tar.bz2
to your chosen path. -
Clone the mbdump repo into desired path. Then follow the instructions below to build the dump.
sudo su - postgres
createuser musicbrainz
createdb -l C -E UTF-8 -T template0 -O musicbrainz musicbrainz
psql musicbrainz -c 'CREATE EXTENSION cube;
psql musicbrainz -c 'CREATE EXTENSION earthdistance;
$MBDATA = your cloned mbdata path
$MBDUMP = your path to mbdump files
mbslave psql -f $MBDATA/mbdata/sql/CreateTables.sql
mbslave psql -f $MBDATA/mbdata/sql/statistics/CreateTables.sql
mbslave psql -f $MBDATA/mbdata/sql/caa/CreateTables.sql
mbslave psql -f $MBDATA/mbdata/sql/wikidocs/CreateTables.sql
mbslave psql -f $MBDATA/mbdata/sql/documentation/CreateTables.sql
mbslave import $MBDUMP/mbdump.tar.bz2
mbslave import $MBDUMP/mbdump-derived.tar.bz2
mbslave psql -f $MBDATA/mbdata/sql/CreatePrimaryKeys.sql
mbslave psql -f $MBDATA/mbdata/sql/statistics/CreatePrimaryKeys.sql
mbslave psql -f $MBDATA/mbdata/sql/caa/CreatePrimaryKeys.sql
mbslave psql -f $MBDATA/mbdata/sql/wikidocs/CreatePrimaryKeys.sql
mbslave psql -f $MBDATA/mbdata/sql/documentation/CreatePrimaryKeys.sql
mbslave psql -f $MBDATA/mbdata/sql/CreateIndexes.sql
mbslave psql -f $MBDATA/mbdata/sql/CreateSlaveIndexes.sql
mbslave psql -f $MBDATA/mbdata/sql/statistics/CreateIndexes.sql
mbslave psql -f $MBDATA/mbdata/sql/caa/CreateIndexes.sql
mbslave psql -f $MBDATA/mbdata/sql/CreateFunctions.sql
mbslave psql -f $MBDATA/mbdata/sql/CreateViews.sql
Then check out and configure the following MusicBrainz db search tool and follow the instructions in it's README.md
to start the elasticsearch API.
- Change all relevent paths in
DeriveGenderFromDb.java
to set paths for MusicBrainz db and the LFM-1b songs table. - Compile and run
LastFmMusicbrainzWrangler.java
- Solutions will be output to the file
part-00000.txt
as tab seperated files. This will be based offLFM-1b_artists.txt
and will have an additional collumn for artist gender.
Gender classification is formed by considering the genders of all current and previous memebers of an artist / band defined in the MusicBrainz db. A final gender classification is output as a '/' seperated String such that gender counts are defined as follows: <br>
unknown gender count / male gender count / female gender count / other gender count / na gender count