makinacorpus/DbToolsBundle

Filtering

bertdk opened this issue · 4 comments

Is it possible to apply the anonymization to a specific set of entities? I would like to only anonymize entities which have a deletedAt timestamp greater or equal to now or filter them by specifying a list of ids.
Use case: running a daily anonymize command on my production database to mask deleted entities

For now, it's not easy to do at all.

But what you ask for is a bit linked to what's asked in #136.

We'll definitively think about adding this kind of feature in the future.

I do something similar but to avoid anonymizing our user accounts.
I extended the email anonymizer by simply adding these instructions after the anonymize method:

public function anonymize(Update $update): void
{
    parent::anonymize($update);

    $where = $update->getWhere();
    $where
        ->isNotLike($expr::column($this->columnName, $this->tableName), '%@mycompany.com')
        ->isNotLike($expr::column($this->columnName, $this->tableName), '%@myclient.com')
    ;
}

But these only apply to the column, not the entire row. In my case it's still ok.

I haven't done any more research, but a PHP Attribute on the entity could be added to apply a where on the row ?
Something like :

#[Where(
   operator: WHERE::AND
    raw: [
        new Not(new Like('email', '%@mycompany.com', true)),
        new Not(new Like('email', '%@myclient.com', true)),
    ]
)]

Well, this tool wasn't design with this use case in mind.

There's a easy for us solution, that would require some code on your side, which is to add the Symfony event dispatcher injection in the anonymizator, create a few event classes (before update, after update, etc...), so you could alter the query at the right moment.

Although you should read my comment there #136 (comment) which explains why this is not a feature right now.

@maxhelias we discussed it internally and I opened #171 for your own use case, it seems legit in the end.

@bertdk but regarding this original issue topic, altering production data using this API directly into your production database is not a use case we intend to support. May be that in the future some new features may help you in doing that. Nevertheless, it's a non goal for us.