
batch_size acts like limit

brki opened this issue · 4 comments

brki commented

When using batch_size, not all records are being processed. In fact only the number of records defined in batch_size is being processed.

I'm using a whitelisting strategy on an sqlite database. The anonymization is defined like this:

require 'data-anonymization'
require 'sqlite3'

DataAnon::Utils::Logging.logger.level = Logger::INFO

database 'foobar' do

  strategy DataAnon::Strategy::Whitelist
  source_db :adapter => 'sqlite3', :database => 'foo.db'
  destination_db :adapter => 'sqlite3', :database => 'foo.anon.db'

  table "foo" do
    primary_key "id"
    batch_size 10

    whitelist "id"
    anonymize("title") { |field| field.value + "foo" }


The table foo was created in source and destination database like this :

sqlite> CREATE TABLE foo(id INTEGER, title TEXT);

The table foo has 4999 records in the source_db, and no records in the destination_db.

When I run the anonymization script, only 10 records are created in the destination_db. I'm expecting that all 4999 records should appear in the destination db.

No errors are reported, the script output looks like:

[vagrant@localhost]$ ruby ruby_scripts/test.rb
I, [2016-01-19T15:11:19.867351 #8410]  INFO -- : Processing table foo records in batch size of 10
foo                  [     1/4999  ] ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉  0% 00:00:00
I, [2016-01-19T15:11:21.740611 #8410]  INFO -- : Fields missing the anonymization strategy

I tried with debug-level logging enabled, but no errors were shown then, either.

These are the gem versions installed:

*** LOCAL GEMS ***
activemodel (4.2.5)
activerecord (4.2.5)
activesupport (4.2.5)
arel (6.0.3)
bigdecimal (1.2.6)
bson (3.2.6, 1.12.5)
bson_ext (1.12.4)
builder (3.2.2)
composite_primary_keys (8.1.2)
data-anonymization (0.7.2)
hashie (3.4.3)
i18n (0.7.0)
io-console (0.4.3)
json (1.8.1)
minitest (5.4.3)
mongo (2.1.2)
parallel (1.6.1)
pg (0.18.4)
power_assert (0.2.2)
powerbar (1.0.16)
protected_attributes (1.1.3)
psych (2.0.8)
rake (10.4.2)
rdoc (4.2.0)
rgeo (0.5.2)
rgeo-geojson (0.4.2)
sqlite3 (1.3.11)
test-unit (3.0.8)
thor (0.19.1)
thread_safe (0.3.5)
tzinfo (1.2.2)

The version of ruby is 2.2.2p95.

Thank u for reporting issue. Give me couple of days and I will look into


On Tue, 19 Jan 2016 at 7:56 PM, Brian wrote:

When using batch_size, not all records are being processed. In fact only
the number of records defined in batch_size is being processed.

I'm using a whitelisting strategy on an sqlite database. The anonymization
is defined like this:

require 'data-anonymization'
require 'sqlite3'

DataAnon::Utils::Logging.logger.level = Logger::INFO

database 'foobar' do

strategy DataAnon::Strategy::Whitelist
source_db :adapter => 'sqlite3', :database => 'foo.db'
destination_db :adapter => 'sqlite3', :database => 'foo.anon.db'

table "foo" do
primary_key "id"
batch_size 10

whitelist "id"
anonymize("title") { |field| field.value + "foo" }



The table foo was created in source and destination database like this :

sqlite> CREATE TABLE foo(id INTEGER, title TEXT);

The table foo has 4999 records in the source_db, and no records in the

When I run the anonymization script, only 10 records are created in the
destination_db. I'm expecting that all 4999 records should appear in the
destination db.

No errors are reported, the script output looks like:

[vagrant@localhost]$ ruby ruby_scripts/test.rb
I, [2016-01-19T15:11:19.867351 #8410] INFO -- : Processing table foo records in batch size of 10
foo [ 1/4999 ] ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 0% 00:00:00
I, [2016-01-19T15:11:21.740611 #8410] INFO -- : Fields missing the anonymization strategy

I tried with debug-level logging enabled, but no errors were shown then,

These are the gem versions installed:

*** LOCAL GEMS ***
activemodel (4.2.5)
activerecord (4.2.5)
activesupport (4.2.5)
arel (6.0.3)
bigdecimal (1.2.6)
bson (3.2.6, 1.12.5)
bson_ext (1.12.4)
builder (3.2.2)
composite_primary_keys (8.1.2)
data-anonymization (0.7.2)
hashie (3.4.3)
i18n (0.7.0)
io-console (0.4.3)
json (1.8.1)
minitest (5.4.3)
mongo (2.1.2)
parallel (1.6.1)
pg (0.18.4)
power_assert (0.2.2)
powerbar (1.0.16)
protected_attributes (1.1.3)
psych (2.0.8)
rake (10.4.2)
rdoc (4.2.0)
rgeo (0.5.2)
rgeo-geojson (0.4.2)
sqlite3 (1.3.11)
test-unit (3.0.8)
thor (0.19.1)
thread_safe (0.3.5)
tzinfo (1.2.2)

The version of ruby is 2.2.2p95.

Reply to this email directly or view it on GitHub

Looked into it quickly. Looks like changes in activerecord library is
causing it to break. Will need more time to look into activerecord code.

Give me sometime, I will look into it over weekend. Till than avoid using
batch :-)


On Tue, 19 Jan 2016 at 9:27 PM, Sunit Parekh wrote:

Thank u for reporting issue. Give me couple of days and I will look into


On Tue, 19 Jan 2016 at 7:56 PM, Brian wrote:

When using batch_size, not all records are being processed. In fact only
the number of records defined in batch_size is being processed.

I'm using a whitelisting strategy on an sqlite database. The
anonymization is defined like this:

require 'data-anonymization'
require 'sqlite3'

DataAnon::Utils::Logging.logger.level = Logger::INFO

database 'foobar' do

strategy DataAnon::Strategy::Whitelist
source_db :adapter => 'sqlite3', :database => 'foo.db'
destination_db :adapter => 'sqlite3', :database => 'foo.anon.db'

table "foo" do
primary_key "id"
batch_size 10

whitelist "id"
anonymize("title") { |field| field.value + "foo" }



The table foo was created in source and destination database like this :

sqlite> CREATE TABLE foo(id INTEGER, title TEXT);

The table foo has 4999 records in the source_db, and no records in the

When I run the anonymization script, only 10 records are created in the
destination_db. I'm expecting that all 4999 records should appear in the
destination db.

No errors are reported, the script output looks like:

[vagrant@localhost]$ ruby ruby_scripts/test.rb
I, [2016-01-19T15:11:19.867351 #8410] INFO -- : Processing table foo records in batch size of 10
foo [ 1/4999 ] ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 0% 00:00:00
I, [2016-01-19T15:11:21.740611 #8410] INFO -- : Fields missing the anonymization strategy

I tried with debug-level logging enabled, but no errors were shown then,

These are the gem versions installed:

*** LOCAL GEMS ***
activemodel (4.2.5)
activerecord (4.2.5)
activesupport (4.2.5)
arel (6.0.3)
bigdecimal (1.2.6)
bson (3.2.6, 1.12.5)
bson_ext (1.12.4)
builder (3.2.2)
composite_primary_keys (8.1.2)
data-anonymization (0.7.2)
hashie (3.4.3)
i18n (0.7.0)
io-console (0.4.3)
json (1.8.1)
minitest (5.4.3)
mongo (2.1.2)
parallel (1.6.1)
pg (0.18.4)
power_assert (0.2.2)
powerbar (1.0.16)
protected_attributes (1.1.3)
psych (2.0.8)
rake (10.4.2)
rdoc (4.2.0)
rgeo (0.5.2)
rgeo-geojson (0.4.2)
sqlite3 (1.3.11)
test-unit (3.0.8)
thor (0.19.1)
thread_safe (0.3.5)
tzinfo (1.2.2)

The version of ruby is 2.2.2p95.

Reply to this email directly or view it on GitHub

@janraasch Thanks for the pull request. I merged it and released 0.7.3

Sure. Thank you for merging this :)