/sql-blog-uuid

This is a repo supporting a blog post describing use of UUIDs within a sql table/database

MIT LicenseMIT

sql-blog-uuid

The overall goal of this repo is to show some different solutions for using uuids as identifiers for table rows. There are a variety of reason why you may want UUIDs over autonumbers or user supplied strings. Although there is a focus on UUIDs and primary keys, there's also some supporting information regarding database normalization as well as some ideas for best practices.

Loading the sql statements

Unfortunately the sql file contains trigggrs and setting/unsetting of the delimeter, so it fails if you attempt to load it in the docker-entrypoint-initdb.d. In this case, you can copy+pasta the sql file AFTER you login to the docker container:

docker compose up -d
docker exec -it mysql mysql -u root -p

UUIDs as primary keys

This is a kind of hack, but in situations where the primary key for a table will be an autonumber integer, there are a couple of situations that can become a headache for these situations:

  • What if the autonumber key is too small e.g., it's an int8 and you have more than 255 rows, and you need to update the data type
  • What if your API requires that you expose a primary key and it being sequential is a possible security hole
  • What if your DBA refuses (or has a good reason) to have a UUID as a primary key
  • What if you acquire another company and their data requires you to "import" a lot of new data that has overlapping primary keys, how can you "protect" breaking the API in this case?

TLDR; you can probably avoid MOST of these problems by just having a UUID auto-generated by the database. There are a lot of reasons why this may not be ideal, most of it preference.

These links may be helpful in understanding these solution:

In the context of bludgeon (or any other application), these uuids would be used as application level keys to interact with those objects or perform operations on those objects e.g., starting/stopping a timer etc. UUIDs are better tools to anonymize keys while maintaining uniqueness and functionality.

Separate uuid table

This is the basic solution, I think everyone should "start" here, because it's easy to connect the dots and it makes total sense. In terms of ease of use and it's "set it and forget it attitude"; I think it's the better overall solution. The other solutions are neat, but overall provide the same solution with more cons than pros.

This solution involves two tables and a trigger, these are the use cases that this solution accomplishes:

  • If an employee is inserted, a uuid is created and associated with it's primary key (autonumber)
  • If an employee is deleted, it's uuid association is also deleted
  • If there is no manual intervention, every unique employee id will be associated with a unique employee uuid

This is the employee table:

CREATE TABLE IF NOT EXISTS employee (
    id BIGINT NOT NULL AUTO_INCREMENT,
    employee_first_name TEXT,
    employee_last_name TEXT,
    employee_email_address TEXT NOT NULL,
    PRIMARY KEY (id),
    UNIQUE(employee_email_address)
) ENGINE = InnoDB;

The employee table stores the employee entity and implements it's business rules: that the id an email describing a row is unique within the table. Yuo may also notice that there is no uuid defined within the table nor is there any link specifically from the employee table to the employee uuid table.

This is the employee_uuid table:

CREATE TABLE IF NOT EXISTS employee_uuid (
    employee_uuid TEXT(36) NOT NULL,
    employee_id BIGINT NOT NULL,
    PRIMARY KEY (employee_id, employee_uuid(36)),
    FOREIGN KEY (employee_id) REFERENCES employee(id) ON DELETE CASCADE,
    INDEX(employee_id, employee_uuid(36))
) ENGINE = InnoDB;

The employee_uuid table describes the relationship between the id in the employee table and the uuid in this table. You'll notice that there is a foreign key constraint to enforce the relationship between the employee id and the employe uuid. This ensures that the employee MUST exist in order to insert or update a row in the table. Because the uuid is the primary key to this table, to also ensures that the uuid is also unique for this table.

This is the employee_insert_uuid trigger:

DELIMITER $$
CREATE TRIGGER employee_insert_uuid
AFTER INSERT
    ON employee FOR EACH ROW BEGIN
INSERT INTO employee_uuid(employee_id, employee_uuid)
    VALUES(new.id, UUID());
END $$
DELIMITER ;

This trigger creates a functional link between the employee and employee_uuid table, it ensures that for every employee inserted, there is a uuid that's updated. It makes the following assumptions:

  • That the primary key for an employee will never change
  • That the uuid describing an employee will never change

Each time an employee is inserted, a uuid is ALSO inserted into the employee_uuid table, alternatively, if an employee is deleted, it's also deleted from the employee_uuid table (due to DELETE CASCADE).

In general, one of the motivations for having separate primary keys (integer) and UUID (string) is that integers [can] play nicer with database operations and indexing while UUIDs are better for application specific uses. Unfortunately, having a table with BOTH a primary key and a uuid doesn't quite violate rules of normalization, but rather is just ambiguous. 2NF (second normal form) indicates that all attributes that aren't candidate keys should depend on the whole candidate key not just part of it. The UUID is dependent on the primary key (id), but not the candidate key email_address. Functionally, I think making them both unique is probably good enough, but putting it in a table of its own, resolves the ambiguity and makes it clear that uuid depends ONLY on the foreign key employee id.

Interacting with the employee table is very straight forward:

MariaDB [employee_table]>   INSERT INTO employee(employee_first_name, employee_last_name, employee_email_address)
    ->    VALUES ('Antonio', 'Alexander', 'antonio.alexander@mistersoftwaredeveloper.com');
Query OK, 1 row affected (0.003 sec)

MariaDB [employee_table]> select * from employee;
+----+---------------------+--------------------+-----------------------------------------------+
| id | employee_first_name | employee_last_name | employee_email_address                        |
+----+---------------------+--------------------+-----------------------------------------------+
|  1 | Antonio             | Alexander          | antonio.alexander@mistersoftwaredeveloper.com |
+----+---------------------+--------------------+-----------------------------------------------+
1 row in set (0.001 sec)

Once an employee is sucessfully inserted, the trigger will take effect and automatically insert a row relating an employee to a uuid in the database

MariaDB [employee_table]> select * from employee_uuid;
+--------------------------------------+-------------+
| employee_uuid                        | employee_id |
+--------------------------------------+-------------+
| 4083f5db-a71e-11ec-baeb-0242ac130002 |           1 |
+--------------------------------------+-------------+
1 row in set (0.000 sec)

For an application, generally you may not have a strong reason to expose the primary key (id), as a result, you can perform a select query (or create a view if you REALLY need to) to query an employee entity with the uuid. This is a simple join; in the example below i've provided two queries, one that uses the full employee.id and another that uses the simplified id.

MariaDB [employee_table]>    SELECT employee_uuid, employee_first_name, employee_last_name, employee_email_address
    ->     FROM employee JOIN employee_uuid ON employee.id;
+--------------------------------------+---------------------+--------------------+-----------------------------------------------+
| employee_uuid                        | employee_first_name | employee_last_name | employee_email_address                        |
+--------------------------------------+---------------------+--------------------+-----------------------------------------------+
| 4083f5db-a71e-11ec-baeb-0242ac130002 | Antonio             | Alexander          | antonio.alexander@mistersoftwaredeveloper.com |
+--------------------------------------+---------------------+--------------------+-----------------------------------------------+
1 row in set (0.001 sec)

MariaDB [employee_table]>    SELECT employee_uuid, employee_first_name, employee_last_name, employee_email_address
    ->     FROM employee JOIN employee_uuid ON id;
+--------------------------------------+---------------------+--------------------+-----------------------------------------------+
| employee_uuid                        | employee_first_name | employee_last_name | employee_email_address                        |
+--------------------------------------+---------------------+--------------------+-----------------------------------------------+
| 4083f5db-a71e-11ec-baeb-0242ac130002 | Antonio             | Alexander          | antonio.alexander@mistersoftwaredeveloper.com |
+--------------------------------------+---------------------+--------------------+-----------------------------------------------+
1 row in set (0.001 sec)

Alternatively, you could create a view to allow people to depend on this table rather than the employee/employee_uuid table:

CREATE VIEW employee_v1 AS
    SELECT
        uuid, employee_first_name, employee_last_name, employee_email_address
    FROM employee
    JOIN employee_uuid ON id;

This can be used to both version/abstract employee from its underlying data type or provide additional functionality (e.g., this could be how you can handle forward compatible changes).

In short, this works really well and gives you a lot of flexibility to prevent people who can write to the employee table from writing to the emploee_uuid table (while the trigger can). It's pretty straight forward, de-couples the uuid from employee id and for the most part, separates the more application centric uuid from the database centric employee id.

UUID as primary key in employee table

This is an obvious optimization: why can't we just have one table? There's nothing preventing you from doing so, but in an attempt to not have ambiguous primary keys, you'd need to completely replace the primary auto number with a uuid. We can combine soem of the techniques we used earlier to accomplish the same "idea" using a single table and no trigger (if the version of mysql is recent enough):

CREATE TABLE IF NOT EXISTS employee (
  id TEXT(36) NOT NULL DEFAULT (UUID()),
  -- KIM: if you're not running at least MYSQL 8.0 you may not be able to have a default
  -- REFERENCE: https://dev.mysql.com/doc/refman/8.0/en/data-type-defaults.html
  -- id TEXT(36) NOT NULL,
  employee_first_name TEXT,
  employee_last_name TEXT,
  employee_email_address TEXT NOT NULL,
  PRIMARY KEY (id(36)),
  UNIQUE(employee_email_address)
) ENGINE = InnoDB;

Optionally, if you're not with the times and you use an older version of mysql, you can also include this trigger:

DELIMITER $$
CREATE TRIGGER `employee_insert_uuid`
BEFORE INSERT
    ON `employee` FOR EACH ROW 
 BEGIN
    IF new.id IS NULL THEN
        SET new.id = UUID();
    END IF;
END $$
DELIMITER ;

While this is quite obviously optimized, the optimization introduces some ambiguity that's harder to resolve. For example, even though we have a "default" uuid that's provided, anyone that can write to the table could alterantively insert a unique 36 character string that's NOT a UUID, which although OK, could be a pain for and application centric code that used it (it would be malformed). To resolve it, you'd need to jump through some hoops with stored procedures and permissions to the table. That aside, the queries pretty much work the same:

MariaDB [employee_primary]> INSERT INTO employee(employee_first_name, employee_last_name, employee_email_address) VALUES ('Antonio', 'Alexander', 'antonio.alexander@mistersoftwaredeveloper.com');
Query OK, 1 row affected (0.003 sec)

MariaDB [employee_primary]> select * from employee;
+--------------------------------------+---------------------+--------------------+-----------------------------------------------+
| id                                   | employee_first_name | employee_last_name | employee_email_address                        |
+--------------------------------------+---------------------+--------------------+-----------------------------------------------+
| 620931ca-a735-11ec-baeb-0242ac130002 | Antonio             | Alexander          | antonio.alexander@mistersoftwaredeveloper.com |
+--------------------------------------+---------------------+--------------------+-----------------------------------------------+
1 row in set (0.001 sec)

Optimized UUID as primary key in table

This is the final optimization for the table which is really just for shits and giggles, maybe this matters with a giant table, like millions of rows. To accomplish this final optimization, we use binary for the uuid/id column rather than text. As you should know, UUIDs can be represented as text: 36 characters at a byte per character or as binary as sixteen bytes (a 50% difference in size).

CREATE TABLE IF NOT EXISTS employee (
    id BINARY(16) PRIMARY KEY NOT NULL DEFAULT (unhex(replace(uuid(),'-',''))),
    -- KIM: if you're not running at least MYSQL 8.0 you may not be able to have a default
    -- REFERENCE: https://dev.mysql.com/doc/refman/8.0/en/data-type-defaults.html
    -- id BINARY(16) PRIMARY KEY NOT NULL,
    employee_first_name TEXT,
    employee_last_name TEXT,
    employee_email_address TEXT NOT NULL,
    id_text varchar(36) generated always as
     (insert(
        insert(
          insert(
            insert(hex(id),9,0,'-'),
            14,0,'-'),
          19,0,'-'),
        24,0,'-')
     ) virtual,
    UNIQUE(employee_email_address)
) ENGINE = InnoDB;

Similar to the above situation, if you were in the past, you could employ a trigger like the following:

DELIMITER $$
CREATE TRIGGER `employee_insert_uuid`
BEFORE INSERT ON `employee` FOR EACH ROW 
BEGIN
  IF new.id IS NULL THEN
    SET new.id = unhex(replace(uuid(),'-',''));
  END IF;
END $$
DELIMITER ;

Use of the table is just about identical to the other solutions, see below:

MariaDB [employee_optimized]> INSERT INTO employee(employee_first_name, employee_last_name, employee_email_address) VALUES ('Antonio', 'Alexander', 'antonio.alexander@mistersoftwaredeveloper.com');
Query OK, 1 row affected (0.004 sec)

MariaDB [employee_optimized]> select * from employee;
+------------------+---------------------+--------------------+-----------------------------------------------+--------------------------------------+
| id               | employee_first_name | employee_last_name | employee_email_address                        | id_text                              |
+------------------+---------------------+--------------------+-----------------------------------------------+--------------------------------------+
| ����8��B�        | Antonio             | Alexander          | antonio.alexander@mistersoftwaredeveloper.com | 8DD0E6E7-A738-11EC-BAEB-0242AC130002 |
+------------------+---------------------+--------------------+-----------------------------------------------+--------------------------------------+
1 row in set (0.003 sec)

MariaDB [employee_optimized]> select id_text, employee_first_name, employee_last_name, employee_email_address from employee;
+--------------------------------------+---------------------+--------------------+-----------------------------------------------+
| id_text                              | employee_first_name | employee_last_name | employee_email_address                        |
+--------------------------------------+---------------------+--------------------+-----------------------------------------------+
| 8DD0E6E7-A738-11EC-BAEB-0242AC130002 | Antonio             | Alexander          | antonio.alexander@mistersoftwaredeveloper.com |
+--------------------------------------+---------------------+--------------------+-----------------------------------------------+
1 row in set (0.001 sec)

Security concerns

Although security is outside the scope of this repo/blog, your 'final' architecture could be heavily influenced by the security model you decide to employ. Generally, the multi-table solution makes it easier to apply security on a per table basis rather than a per column basis. A single table solution may make it more difficult to maintain data consistency and cover gaps with non-sensical queries/validation.