batch_size acts like limit
brki opened this issue · 4 comments
When using batch_size
, not all records are being processed. In fact only the number of records defined in batch_size
is being processed.
I'm using a whitelisting strategy on an sqlite database. The anonymization is defined like this:
require 'data-anonymization'
require 'sqlite3'
DataAnon::Utils::Logging.logger.level = Logger::INFO
database 'foobar' do
strategy DataAnon::Strategy::Whitelist
source_db :adapter => 'sqlite3', :database => 'foo.db'
destination_db :adapter => 'sqlite3', :database => 'foo.anon.db'
table "foo" do
primary_key "id"
batch_size 10
whitelist "id"
anonymize("title") { |field| field.value + "foo" }
end
end
The table foo
was created in source and destination database like this :
sqlite> CREATE TABLE foo(id INTEGER, title TEXT);
The table foo
has 4999 records in the source_db
, and no records in the destination_db
.
When I run the anonymization script, only 10 records are created in the destination_db
. I'm expecting that all 4999 records should appear in the destination db.
No errors are reported, the script output looks like:
[vagrant@localhost]$ ruby ruby_scripts/test.rb
I, [2016-01-19T15:11:19.867351 #8410] INFO -- : Processing table foo records in batch size of 10
foo [ 1/4999 ] ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 0% 00:00:00
I, [2016-01-19T15:11:21.740611 #8410] INFO -- : Fields missing the anonymization strategy
I tried with debug-level logging enabled, but no errors were shown then, either.
These are the gem versions installed:
*** LOCAL GEMS ***
activemodel (4.2.5)
activerecord (4.2.5)
activesupport (4.2.5)
arel (6.0.3)
bigdecimal (1.2.6)
bson (3.2.6, 1.12.5)
bson_ext (1.12.4)
builder (3.2.2)
composite_primary_keys (8.1.2)
data-anonymization (0.7.2)
hashie (3.4.3)
i18n (0.7.0)
io-console (0.4.3)
json (1.8.1)
minitest (5.4.3)
mongo (2.1.2)
parallel (1.6.1)
pg (0.18.4)
power_assert (0.2.2)
powerbar (1.0.16)
protected_attributes (1.1.3)
psych (2.0.8)
rake (10.4.2)
rdoc (4.2.0)
rgeo (0.5.2)
rgeo-geojson (0.4.2)
sqlite3 (1.3.11)
test-unit (3.0.8)
thor (0.19.1)
thread_safe (0.3.5)
tzinfo (1.2.2)
The version of ruby is 2.2.2p95.
Thank u for reporting issue. Give me couple of days and I will look into
it.
Regards,
Sunit
On Tue, 19 Jan 2016 at 7:56 PM, Brian notifications@github.com wrote:
When using batch_size, not all records are being processed. In fact only
the number of records defined in batch_size is being processed.I'm using a whitelisting strategy on an sqlite database. The anonymization
is defined like this:require 'data-anonymization'
require 'sqlite3'DataAnon::Utils::Logging.logger.level = Logger::INFO
database 'foobar' do
strategy DataAnon::Strategy::Whitelist
source_db :adapter => 'sqlite3', :database => 'foo.db'
destination_db :adapter => 'sqlite3', :database => 'foo.anon.db'table "foo" do
primary_key "id"
batch_size 10whitelist "id" anonymize("title") { |field| field.value + "foo" }
end
end
The table foo was created in source and destination database like this :
sqlite> CREATE TABLE foo(id INTEGER, title TEXT);
The table foo has 4999 records in the source_db, and no records in the
destination_db.When I run the anonymization script, only 10 records are created in the
destination_db. I'm expecting that all 4999 records should appear in the
destination db.No errors are reported, the script output looks like:
[vagrant@localhost]$ ruby ruby_scripts/test.rb
I, [2016-01-19T15:11:19.867351 #8410] INFO -- : Processing table foo records in batch size of 10
foo [ 1/4999 ] ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 0% 00:00:00
I, [2016-01-19T15:11:21.740611 #8410] INFO -- : Fields missing the anonymization strategyI tried with debug-level logging enabled, but no errors were shown then,
either.These are the gem versions installed:
*** LOCAL GEMS ***
activemodel (4.2.5)
activerecord (4.2.5)
activesupport (4.2.5)
arel (6.0.3)
bigdecimal (1.2.6)
bson (3.2.6, 1.12.5)
bson_ext (1.12.4)
builder (3.2.2)
composite_primary_keys (8.1.2)
data-anonymization (0.7.2)
hashie (3.4.3)
i18n (0.7.0)
io-console (0.4.3)
json (1.8.1)
minitest (5.4.3)
mongo (2.1.2)
parallel (1.6.1)
pg (0.18.4)
power_assert (0.2.2)
powerbar (1.0.16)
protected_attributes (1.1.3)
psych (2.0.8)
rake (10.4.2)
rdoc (4.2.0)
rgeo (0.5.2)
rgeo-geojson (0.4.2)
sqlite3 (1.3.11)
test-unit (3.0.8)
thor (0.19.1)
thread_safe (0.3.5)
tzinfo (1.2.2)The version of ruby is 2.2.2p95.
—
Reply to this email directly or view it on GitHub
#30.
Looked into it quickly. Looks like changes in activerecord library is
causing it to break. Will need more time to look into activerecord code.
Give me sometime, I will look into it over weekend. Till than avoid using
batch :-)
Thanks
Sunit
On Tue, 19 Jan 2016 at 9:27 PM, Sunit Parekh parekh.sunit@gmail.com wrote:
Thank u for reporting issue. Give me couple of days and I will look into
it.Regards,
SunitOn Tue, 19 Jan 2016 at 7:56 PM, Brian notifications@github.com wrote:
When using batch_size, not all records are being processed. In fact only
the number of records defined in batch_size is being processed.I'm using a whitelisting strategy on an sqlite database. The
anonymization is defined like this:require 'data-anonymization'
require 'sqlite3'DataAnon::Utils::Logging.logger.level = Logger::INFO
database 'foobar' do
strategy DataAnon::Strategy::Whitelist
source_db :adapter => 'sqlite3', :database => 'foo.db'
destination_db :adapter => 'sqlite3', :database => 'foo.anon.db'table "foo" do
primary_key "id"
batch_size 10whitelist "id" anonymize("title") { |field| field.value + "foo" }
end
end
The table foo was created in source and destination database like this :
sqlite> CREATE TABLE foo(id INTEGER, title TEXT);
The table foo has 4999 records in the source_db, and no records in the
destination_db.When I run the anonymization script, only 10 records are created in the
destination_db. I'm expecting that all 4999 records should appear in the
destination db.No errors are reported, the script output looks like:
[vagrant@localhost]$ ruby ruby_scripts/test.rb
I, [2016-01-19T15:11:19.867351 #8410] INFO -- : Processing table foo records in batch size of 10
foo [ 1/4999 ] ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 0% 00:00:00
I, [2016-01-19T15:11:21.740611 #8410] INFO -- : Fields missing the anonymization strategyI tried with debug-level logging enabled, but no errors were shown then,
either.These are the gem versions installed:
*** LOCAL GEMS ***
activemodel (4.2.5)
activerecord (4.2.5)
activesupport (4.2.5)
arel (6.0.3)
bigdecimal (1.2.6)
bson (3.2.6, 1.12.5)
bson_ext (1.12.4)
builder (3.2.2)
composite_primary_keys (8.1.2)
data-anonymization (0.7.2)
hashie (3.4.3)
i18n (0.7.0)
io-console (0.4.3)
json (1.8.1)
minitest (5.4.3)
mongo (2.1.2)
parallel (1.6.1)
pg (0.18.4)
power_assert (0.2.2)
powerbar (1.0.16)
protected_attributes (1.1.3)
psych (2.0.8)
rake (10.4.2)
rdoc (4.2.0)
rgeo (0.5.2)
rgeo-geojson (0.4.2)
sqlite3 (1.3.11)
test-unit (3.0.8)
thor (0.19.1)
thread_safe (0.3.5)
tzinfo (1.2.2)The version of ruby is 2.2.2p95.
—
Reply to this email directly or view it on GitHub
#30.
@janraasch Thanks for the pull request. I merged it and released 0.7.3
Sure. Thank you for merging this :)