jackalope/jackalope-doctrine-dbal

No foreign keys are used on phpcr_binarydata, resulting in dangling binary data?

Closed this issue · 8 comments

When looking at the schema generated for a MySQL database, I noticed that no foreign keys are used.

Is this on purpose? For example: https://github.com/jackalope/jackalope-doctrine-dbal/blob/master/src/Jackalope/Transport/DoctrineDBAL/RepositorySchema.php#L123-L134

The phpcr_binarydata.node_id column is a foreign key of phpcr_nodes.id and can (/ should?) be configured as such?

An advantage can be that upon removing the node, also the binary-data is removed. For some reason I currently have a table with a lot "dangling" entries in the phpcr_binarydata table, which can be determined with a query like:

SELECT id 
FROM phpcr_binarydata 
WHERE node_id NOT IN (SELECT id FROM phpcr_nodes);

I think this can be prevented by adding a foreign key, something like this:

$binary->addForeignKeyConstraint('phpcr_nodes', ['node_id'], ['id'], ['onDelete' => 'CASCADE']);

Mhhh. Is the id an extra column if type integer to to get a dbal record complete? Or is it our well known path? First would mean, that we do not have any access from our app on it.

uhm... the phpcr_nodes.id column is the existing the primary key

$nodes->addColumn('id', 'integer', ['autoincrement' => true]);

First would mean, that we do not have any access from our app on it.

Not sure I understand what you mean. This is pure at SQL level, right? So the "app" does not even need access?

Same as done here:

$weakreferences->addForeignKeyConstraint($nodes, ['source_id'], ['id'], ['onDelete' => 'CASCADE']);
$weakreferences->addForeignKeyConstraint($nodes, ['target_id'], ['id'], ['onDelete' => 'CASCADE']);

But even when the app (user of implementation,) does not use it, cause it depend on an Interface only, the implementation could need at least a key on important columns. And maybe some forreign keys. But that depends on the usecase. If noone uses a key, or a link betweem tanbles, that key is useless. So we should collect usecases for stuff. If we are good, then we have mor clear view

My suggestion does not change anything to the existing functionality. Keys, etc.

The only thing it does:

  • when a Node X is removed from phpcr_nodes
  • then the database also removes the binary data in phpcr_binarydata that refers to Node X

This way, the database itself ensures that phpcr_binarydata only contains data that is actually used...

dbu commented

its been quite a while when we built that... i can't remember if there was any specific reason not to do foreign keys for this. but if the node table and binarydata table use the same node id and do not somehow try to optimize by sharing binary data, i think it would make a lot of sense to add the foreign key functionality. the tests probably focus on creating and modifying data, and not on removing it :-/

do you want to look into this? i guess the complete fix would be:

  • updated schema
  • add some tests for removing a node with binary data that then checks on database level that things have been cleaned up
  • changelog entry that explains what sql instructions to run to add the foreign key in existing installations (there are no migrations for the schema) - and also an sql statement to delete dangling binary data, both for cleanup and because the foreign key can only be added if the data is in a sane state.

@dbu sounds like a nice structured approach, kind of busy coming time, but will try to contribute.

For the eager folks, the following can already help:

DELETE
FROM phpcr_binarydata 
WHERE node_id NOT IN (SELECT id FROM phpcr_nodes);

ALTER TABLE phpcr_binarydata ADD CONSTRAINT FK_37E65615460D9FD7 FOREIGN KEY (node_id) REFERENCES phpcr_nodes (id) ON DELETE CASCADE;

Btw: i did not want to reject the key. I just wanted to say that the keys can change application, especially for cascade options.

dbu commented

thanks for that update. yeah, that does not look too scary. glad if you can do a PR to adjust the db init script and wrap those 2 statements into a changelog entry. no hurry of course, we are all busy people ;-)