realrolfje/anonimatron

Primary Key Exception

Closed this issue · 4 comments

Descrition
Anonimatron 1.13 throws Error about missing mySQL pimary key, although it's present.

To Reproduce
Steps to reproduce the behavior:

  1. Create a database with the following contents
CREATE TABLE regions (
	region_id INT (11) AUTO_INCREMENT PRIMARY KEY,
	region_name VARCHAR (25) DEFAULT NULL
);

CREATE TABLE countries (
	country_id CHAR (2) PRIMARY KEY,
	country_name VARCHAR (40) DEFAULT NULL,
	region_id INT (11) NOT NULL,
	FOREIGN KEY (region_id) REFERENCES regions (region_id) ON DELETE CASCADE ON UPDATE CASCADE
);

CREATE TABLE locations (
	location_id INT (11) AUTO_INCREMENT PRIMARY KEY,
	street_address VARCHAR (40) DEFAULT NULL,
	postal_code VARCHAR (12) DEFAULT NULL,
	city VARCHAR (30) NOT NULL,
	state_province VARCHAR (25) DEFAULT NULL,
	country_id CHAR (2) NOT NULL,
	FOREIGN KEY (country_id) REFERENCES countries (country_id) ON DELETE CASCADE ON UPDATE CASCADE
);

CREATE TABLE jobs (
	job_id INT (11) AUTO_INCREMENT PRIMARY KEY,
	job_title VARCHAR (35) NOT NULL,
	min_salary DECIMAL (8, 2) DEFAULT NULL,
	max_salary DECIMAL (8, 2) DEFAULT NULL
);

CREATE TABLE departments (
	department_id INT (11) AUTO_INCREMENT PRIMARY KEY,
	department_name VARCHAR (30) NOT NULL,
	location_id INT (11) DEFAULT NULL,
	FOREIGN KEY (location_id) REFERENCES locations (location_id) ON DELETE CASCADE ON UPDATE CASCADE
);

CREATE TABLE employees (
	employee_id INT (11) AUTO_INCREMENT PRIMARY KEY,
	first_name VARCHAR (20) DEFAULT NULL,
	last_name VARCHAR (25) NOT NULL,
	email VARCHAR (100) NOT NULL,
	phone_number VARCHAR (20) DEFAULT NULL,
	hire_date DATE NOT NULL,
	job_id INT (11) NOT NULL,
	salary DECIMAL (8, 2) NOT NULL,
	manager_id INT (11) DEFAULT NULL,
	department_id INT (11) DEFAULT NULL,
	FOREIGN KEY (job_id) REFERENCES jobs (job_id) ON DELETE CASCADE ON UPDATE CASCADE,
	FOREIGN KEY (department_id) REFERENCES departments (department_id) ON DELETE CASCADE ON UPDATE CASCADE,
	FOREIGN KEY (manager_id) REFERENCES employees (employee_id)
);

CREATE TABLE dependents (
	dependent_id INT (11) AUTO_INCREMENT PRIMARY KEY,
	first_name VARCHAR (50) NOT NULL,
	last_name VARCHAR (50) NOT NULL,
	relationship VARCHAR (25) NOT NULL,
	employee_id INT (11) NOT NULL,
	FOREIGN KEY (employee_id) REFERENCES employees (employee_id) ON DELETE CASCADE ON UPDATE CASCADE
);
  1. Configure Anonimatron as follows:
<?xml version="1.0" encoding="UTF-8"?>
<configuration jdbcurl="jdbc:mysql://192.168.122.254:3306/employees?characterEncoding=utf-8" userid="user" password="password">
    <anonymizerclass>org.sf.anoimatron.CommunityAnonymizer</anonymizerclass>
    <table name="employees">
        <column name="employee_id" type="RANDOMDIGITS"/>
        <column name="first_name" type="ELVEN_NAME" />
	    <column name="last_name" type="ROMAN_NAME" />
	    <column name="email" type="EMAIL_ADDRESS"/>				
	    <column name="phone_number" type="RANDOMDIGITS"/>	
        <column name="hire_date" type="DATE" />
	    <column name="job_id" type="RANDOMDIGITS"/>
	    <column name="salary" type="RANDOMDIGITS"/>
        <column name="manager_id" type="RANDOMDIGITS" />
        <column name="department_id" type="RANDOMDIGITS" />        
    </table>
</configuration>
  1. Start with the following command
[xyz@localhost anonimatron-1.13]$ ./anonimatron.sh -config config.xml -synonyms synonyms.xml
  1. See error in log:
2020-06-02 17:45:21,388 INFO  (Configuration.java:0) Configuration read from /home/kln/Anonimatron/anonimatron-1.13/config.xml []
2020-06-02 17:45:21,411 FATAL (AnonymizerService.java:0) Could not instantiate class org.sf.anoimatron.CommunityAnonymizer. Please make sure that the class is on the classpath, and it has a default public constructor. []
java.lang.ClassNotFoundException: org.sf.anoimatron.CommunityAnonymizer
	at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:602)
	at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
	at java.base/java.lang.Class.forName0(Native Method)
	at java.base/java.lang.Class.forName(Class.java:340)
	at com.rolfje.anonimatron.anonymizer.AnonymizerService.registerAnonymizers(AnonymizerService.java:60)
	at com.rolfje.anonimatron.Anonimatron.anonymize(Anonimatron.java:90)
	at com.rolfje.anonimatron.Anonimatron.main(Anonimatron.java:48)
2020-06-02 17:45:21,412 INFO  (AnonymizerService.java:0) 1 anonymizers registered. []
2020-06-02 17:45:21,564 INFO  (JdbcAnonymizerService.java:0) Conected to 'jdbc:mysql://192.168.122.254:3306/employees?characterEncoding=utf-8' with user 'user'. []
2020-06-02 17:45:21,567 INFO  (JdbcAnonymizerService.java:0) Table employees has 0 rows to process. []
2020-06-02 17:45:21,572 ERROR (JdbcAnonymizerService.java:0) Table employees does not contain a primary key and can not be anonymyzed. [Table 'employees']
2020-06-02 17:45:21,572 FATAL (JdbcAnonymizerService.java:0) Anonymyzation stopped because of fatal error. [Table 'employees']
java.lang.RuntimeException: Table employees does not contain a primary key and can not be anonymyzed.
	at com.rolfje.anonimatron.jdbc.JdbcAnonymizerService.getPrimaryKeys(JdbcAnonymizerService.java:383)
	at com.rolfje.anonimatron.jdbc.JdbcAnonymizerService.getSelectStatement(JdbcAnonymizerService.java:347)
	at com.rolfje.anonimatron.jdbc.JdbcAnonymizerService.processTableColumns(JdbcAnonymizerService.java:170)
	at com.rolfje.anonimatron.jdbc.JdbcAnonymizerService.preScanTable(JdbcAnonymizerService.java:130)
	at com.rolfje.anonimatron.jdbc.JdbcAnonymizerService.anonymize(JdbcAnonymizerService.java:84)
	at com.rolfje.anonimatron.Anonimatron.anonymize(Anonimatron.java:94)
	at com.rolfje.anonimatron.Anonimatron.main(Anonimatron.java:48)

Expected behavior
Normal behavior

Desktop

  • OS: Fedora Linux V32
  • java version "14" 2020-03-17
    Java(TM) SE Runtime Environment (build 14+36-1461)
    Java HotSpot(TM) 64-Bit Server VM (build 14+36-1461, mixed mode, sharing)

Thank you

I assume that this is not about the class not found exception, which is a classpath problem where Anonimatron can not find your custom anonimyzer.

I'll focus on the primary key here. I think the problem is an incorrect error, as I see that you are trying to anonymize the primary key of a table. I'll double check, but I believe that is currently not possible. I do understand that you'd want to do that in this case as the employee id is probably a publicly known unique identifier that is on a badge, which you'd want to anonimyze.

Although I understand that, I also see that there is an inherent danger where a random ID can clash with an existing ID during the update of a row.

Right, it is about the primary key.

Of course the DB Skript above is a dummy from the internet, but I could recreate the problem with it as well as one the real one, where the employee ID is, in fact, more or less public.

Also it isn't a problem, that the table relations a broken after anonymization. I expected that.
If it's not possible to anonymize the primary kees without a lot of changes to the source, I'll find an other solution - don't worry. Thank you!

I understand. I think for most users it is important that the database stays consistent (that is after all one of the reasons to have Anonimatron: A consistent database but without sensitive data). So I'd like Anonimatron to not break the database.

Anonimatron processes tables row-by-row, so I don't have to keep everything in memory. This way Anonimatron can process pretty large tables. Refactoring it to anonimyze primary keys will take some effort and possibly make it harder to use.

One (crude) way around this is to have two scripts:

  1. Execute a script to change the table to add a new (temporary) primary key column
  2. Excute anonimatron to anonymize the employee id (in the example above)
  3. Execute another script to remove the temporary primary key and alter the table so that employee id is the primary key again.

I expect this to work because Anonimatron will change the same employee id into the same anonimyzed id every time (given you use the synonyms file).

Meanwhile I have improved the error message, thank you for your clear explanation of the problem.

Thank you a lot!
Also for your great project!