ValueError when copying a table with id of type uuid4
ybycode opened this issue · 4 comments
Hey,
I get this error when trying to anonymize any table which primary key id uses uuid4:
INFO: Found table definition "users"
1216it [00:00, 38715.87it/s]
Processing 1 batches for users: 0%| | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/pgcopy/copy.py", line 210, in f
return formatter(v)
File "/usr/local/lib/python3.10/site-packages/pgcopy/copy.py", line 109, in uuid_formatter
return 'i2Q', (16, (guid.int >> 64) & MAX_INT64, guid.int & MAX_INT64)
AttributeError: 'str' object has no attribute 'int'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/bin/pganonymize", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.10/site-packages/pganonymizer/__main__.py", line 12, in main
main(args)
File "/usr/local/lib/python3.10/site-packages/pganonymizer/cli.py", line 79, in main
anonymize_tables(connection, schema.get('tables', []), verbose=args.verbose, dry_run=args.dry_run)
File "/usr/local/lib/python3.10/site-packages/pganonymizer/utils.py", line 43, in anonymize_tables
build_and_then_import_data(connection, table_name, primary_key, columns, excludes,
File "/usr/local/lib/python3.10/site-packages/pganonymizer/utils.py", line 94, in build_and_then_import_data
import_data(connection, temp_table, [primary_key] + column_names, filter(None, data))
File "/usr/local/lib/python3.10/site-packages/pganonymizer/utils.py", line 173, in import_data
mgr.copy([[escape_str_replace(val) for col, val in row.items()] for row in data])
File "/usr/local/lib/python3.10/site-packages/pgcopy/copy.py", line 294, in copy
self.writestream(data, datastream)
File "/usr/local/lib/python3.10/site-packages/pgcopy/copy.py", line 322, in writestream
f, d = formatter(val)
File "/usr/local/lib/python3.10/site-packages/pgcopy/copy.py", line 135, in <lambda>
return lambda v: ('i', (-1,)) if v is None else formatter(v)
File "/usr/local/lib/python3.10/site-packages/pgcopy/copy.py", line 213, in f
errors.raise_from(ValueError, message, exc)
File "/usr/local/lib/python3.10/site-packages/pgcopy/errors/py3.py", line 9, in raise_from
raise exccls(message) from exc
ValueError: error formatting value 16cfc1fb-16fc-4888-b6a7-3638698df7ae for column id
ERROR: 1
The value received by pgcopy is this string 16cfc1fb-16fc-4888-b6a7-3638698df7ae
, when it's expecting a instance of class uuid.UUID
.
This is happening with this as schema.yml:
tables:
- users:
fields:
- password:
provider:
name: mask
sign: '?'
For context:
$ pip freeze
Faker==9.8.0
parmap==1.5.3
pganonymize==0.6.1
pgcopy==1.5.0
psycopg2==2.9.1
python-dateutil==2.8.2
pytz==2021.3
PyYAML==6.0
six==1.16.0
text-unidecode==1.3
tqdm==4.62.3
$ postgres --version
postgres (PostgreSQL) 11.8
$ psql -d my_db -c "\d users"
Table "public.users"
Column | Type | Collation | Nullable | Default
--------------------------+-----------------------------+-----------+----------+-------------------------
id | uuid | | not null |
...
Please let me know if some more information is needed, or if I missed some info from the documentation 😅
Hey, thanks for reporting the issue. I haven't tested the anonymizer yet with a uuid
as a primary key and with PostgreSQL 11. I will take a look as soon as possible - but it's definitely not something with your schema definition or that you missed something from the documentation.
Any news from this?
Hi, unfortunately not yet, because I currently lack the time due to ongoing projects. If someone would find a fix for this, I would be very grateful.
Any news from this?
Hey, can you try to use uuid4
provider? It will generate a new random uuid with correct data type