rheinwerk-verlag/pganonymize

ValueError when copying a table with id of type uuid4

ybycode opened this issue · 4 comments

Hey,

I get this error when trying to anonymize any table which primary key id uses uuid4:

INFO: Found table definition "users"
1216it [00:00, 38715.87it/s]                                                                   
Processing 1 batches for users:   0%|                           | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/pgcopy/copy.py", line 210, in f
    return formatter(v)
  File "/usr/local/lib/python3.10/site-packages/pgcopy/copy.py", line 109, in uuid_formatter
    return 'i2Q', (16, (guid.int >> 64) & MAX_INT64, guid.int & MAX_INT64)
AttributeError: 'str' object has no attribute 'int'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/pganonymize", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/site-packages/pganonymizer/__main__.py", line 12, in main
    main(args)
  File "/usr/local/lib/python3.10/site-packages/pganonymizer/cli.py", line 79, in main
    anonymize_tables(connection, schema.get('tables', []), verbose=args.verbose, dry_run=args.dry_run)
  File "/usr/local/lib/python3.10/site-packages/pganonymizer/utils.py", line 43, in anonymize_tables
    build_and_then_import_data(connection, table_name, primary_key, columns, excludes,
  File "/usr/local/lib/python3.10/site-packages/pganonymizer/utils.py", line 94, in build_and_then_import_data
    import_data(connection, temp_table, [primary_key] + column_names, filter(None, data))
  File "/usr/local/lib/python3.10/site-packages/pganonymizer/utils.py", line 173, in import_data
    mgr.copy([[escape_str_replace(val) for col, val in row.items()] for row in data])
  File "/usr/local/lib/python3.10/site-packages/pgcopy/copy.py", line 294, in copy
    self.writestream(data, datastream)
  File "/usr/local/lib/python3.10/site-packages/pgcopy/copy.py", line 322, in writestream
    f, d = formatter(val)
  File "/usr/local/lib/python3.10/site-packages/pgcopy/copy.py", line 135, in <lambda>
    return lambda v: ('i', (-1,)) if v is None else formatter(v)
  File "/usr/local/lib/python3.10/site-packages/pgcopy/copy.py", line 213, in f
    errors.raise_from(ValueError, message, exc)
  File "/usr/local/lib/python3.10/site-packages/pgcopy/errors/py3.py", line 9, in raise_from
    raise exccls(message) from exc
ValueError: error formatting value 16cfc1fb-16fc-4888-b6a7-3638698df7ae for column id
ERROR: 1

The value received by pgcopy is this string 16cfc1fb-16fc-4888-b6a7-3638698df7ae, when it's expecting a instance of class uuid.UUID.

This is happening with this as schema.yml:

tables:
 - users:
    fields:
     - password:
        provider:
          name: mask
          sign: '?'

For context:

$ pip freeze
Faker==9.8.0
parmap==1.5.3
pganonymize==0.6.1
pgcopy==1.5.0
psycopg2==2.9.1
python-dateutil==2.8.2
pytz==2021.3
PyYAML==6.0
six==1.16.0
text-unidecode==1.3
tqdm==4.62.3

$ postgres --version
postgres (PostgreSQL) 11.8

$ psql -d my_db -c "\d users"

                                      Table "public.users"
          Column          |            Type             | Collation | Nullable |         Default         
--------------------------+-----------------------------+-----------+----------+-------------------------
 id                       | uuid                        |           | not null | 
 ...

Please let me know if some more information is needed, or if I missed some info from the documentation 😅

hkage commented

Hey, thanks for reporting the issue. I haven't tested the anonymizer yet with a uuid as a primary key and with PostgreSQL 11. I will take a look as soon as possible - but it's definitely not something with your schema definition or that you missed something from the documentation.

Any news from this?

hkage commented

Hi, unfortunately not yet, because I currently lack the time due to ongoing projects. If someone would find a fix for this, I would be very grateful.

Any news from this?

Hey, can you try to use uuid4 provider? It will generate a new random uuid with correct data type