elliotchance/mbzdb

Artists that don't exist in artist/artist_credit/artist_credit_name tables but they do in artist_name table

Closed this issue · 2 comments

For example, if u query the "artist_name" table where "id" in (178605, 178606, 178607, 178608), u get results like: "David Bowie 'and something else'". And if u do the query in the tables "artist"/"artist_credit"/"artist_credit_name" where name in (the same id's as above) u get no results.

Try querying also artist WHERE name IN (178605, 178606, 178607, 178608) OR sort_name IN (178605, 178606, 178607, 178608)

By the way, mbzdb ports the official MusicBrainz db and I think it's not its job to look for incorrect db rows -- it just parses the MB replications row-by-row. If the db really has "unused" rows on artist_name it is MusicBrainz fault (and NGS is still at beta and can have errors on the converting scripts)....

ChurruKa is correct. All data integrity is performed by the real MusicBrainz server. If there is any data problems it can be explained if one or more of the following occur:

  1. Some freak accident in the SQL information that comes through the replication such as a bad unicode character that would cause the statement to fail.
  2. The skipping of replications, or otherwise incomplete replications spit out from the MusicBrainz server (very common through the betas of NGS).

mbzdb does do a unique checking since it inherits the primary keys from the PostgreSQL schema so loading the same replication in twice will stop any rows from being inserted twice if there is a primary key on the table. Referential integrity (foreign keys) is neither needed nor applies since the replications rely on the data already being "clean" in a relational sense.

I am going to close this issue on the grounds that this is not a known bug but most likely one of the many problems with the early generations of NGS beta replications.