sunitparekh/data-anonymization

Bulk table updates

abrom opened this issue · 1 comments

abrom commented

Not an issue per se.. I've been adding a bulk table update method to a fork of your project and thought you might be interested. Relatively simplistic at the moment but the general gist is

bulk_table 'my_table' do
  where "some_column != 'some value'"
  anonymize('pii_column') { 'xxxxxxxx' }
end

Seeing as the where filter is passed straight through to AR, it can be a hash or could include a subquery filter. The anonymisation currently just passes a random string through to the strategy but it could be made a bit smarter, looking at column type etc - for my purpose I'm just using the Anonymous strategy with a block as per above.

Another thought might be to simplify things even further by passing the query itself through as a param. Something like:

bulk_table ... do
  with_query do |query|
    query.
      joins('join other_table... ')
      where(other_table: { value: 'bar' })
  end
end

Not sure if there is a nice way to do cross connection copies, other than to dump and load. Seemed a bit crazy to do that in memory (and also didn't fit my use case), so for now only supports anonymising the source DB:

Studiosity@cdfcfec

I am working on porting this tool to Java/Kotlin for better performance. If you want to give it a try for early version you can find it here...
https://github.com/dataanon/data-anon
Sample project https://github.com/dataanon/dataanon-kotlin-sample