/needle

One foreign key to rule them all and in the darkness, bind them.

Primary LanguageElixirApache License 2.0Apache-2.0

Needle

hex.pm hexdocs

Universal foreign keys, virtual schemas, and shared data mixins for Ecto

One foreign key to rule them all and in the darkness, bind them.

-- Gandalf, paraphrased.

A means of foreign keying many tables in one field. Designed for highly interlinked data in highly dynamic schemata where tracking all the foreign keys is neither desired nor practical.

Note: a universal foreign key is actually a hard problem. Many approaches are on offer with a variety of tradeoffs. You should carefully consider a variety of approaches rather than just blindly adopting the one that fitted our project's needs the best!

Background

A Table is a record of a table that may be linked to by a pointer. A Pointer is a pointer id and a table id.

With these two ingredients, we can construct a means of pointing to any table that has a Table entry.

Pointer and Table IDs are both Needle.ULID, a UUID-like type that combines a millisecond-precision timestamp and some randomness to reduce the likelihood of a clash. It naturally sorts both in binary and text form by time and as far as postgres is concerned, it's a UUID.

Installation

Aside from the hex dependency, you will also need to write a simple migration to set up the database before you can start writing your regular migrations:

defmodule MyApp.Repo.Migrations.InitPointers do
  @moduledoc false
  use Ecto.Migration
  import Needle.Migration

  def up(), do: inits(:up)
  def down(), do: inits(:down)

  defp inits(dir) do
    init_pointers_ulid_extra(dir) # this one is optional but recommended
    init_pointers(dir) # this one is not optional
  end
end

Defining a Pointable Type

Pointable tables require a unique sentinel ULID to identify them. These must be 26 characters long and in the alphabet of Crockford's Base32. They should be easy to identify in a printout and might be silly.

There is a helper function, synthesise!/1 in Needle.ULID to assist with this process - give it a 26-character long binary of ascii alphanumerics and it will give you the closest ULID that matches back.

Let's look at a simple schema:

defmodule MyApp.Greeting do
  use Needle.Pointable,
    otp_app: :my_app,
    source: "myapp_greeting",
    table_id: "GREET1NGSFR0MD0CEXAMP1E000"

  pointable_schema do
    field :greeting, :string
  end
end

To declare a pointable schema, we start by using Needle.Pointable, providing the name of our otp application, the source table's name in the database and our chosen sentinel ULID.

We then call pointable_schema and define any fields we wish to put directly in the table. For the most part, pointable_schema is like Ecto's schema macro, except you do not provide the table name and let it handle the primary key.

If for some reason you wished to turn autogeneration off, you could pass autogenerate: false to the options provided when using Needle.Pointable.

Now let's define the migration for our schema:

defmodule MyApp.Repo.Migrations.Greeting do
  @moduledoc false
  use Ecto.Migration
  import Needle.Migration

  def up() do
    create_pointable_table(:greeting, "GREET1NGSFR0MD0CEXAMP1E000") do
      add :greeting, :text, null: false
    end
  end

  def down() do
    drop_pointable_table(:greeting, "GREET1NGSFR0MD0CEXAMP1E000")
  end
end

As you can see, it's pretty similar to defining a regular migration, except you use create_pointable_table and drop_pointable_table. Notice that our sentinel ULID makes an appearance again here. It's very important that these match what we declared in the schema.

Referencing Needle

Ecto does not know anything about our scheme, so unless we specifically want something to reference one of the pointed tables, we typically belongs_to with Needle.Pointer. The table in which we do this does not itself need to be pointable.

defmodule MyApp.Foo do

  use Ecto.Schema
  alias Needle.Pointer

  # regular ecto table, not pointable!
  schema "hello" do
    belongs_to :pointer, Pointer # who knows what it points to?
  end
end

You may choose to reference a specific schema rather than Pointer if it will only point to a single table. If you do this, you must ensure that the referenced record exists in that table in the normal way. There may be some performance benefit, we didn't benchmark it.

The migration is slightly more complex, we have to decide what type of a pointer it is. Needle come in three categories:

  • A strong pointer is not nullable and is deleted when the object it points to is deleted.
  • A weak pointer is nullable and is nilified when the object it points to is deleted.
  • An unbreakable pointer will raise when you attempt to delete the object it points to.
Type Nullable? On Delete
Strong No Cascade
Weak Yes Set Null
Unbreakable No Raise

In this case we will use a strong pointer, because we want it to be deleted if the pointed object is deleted.

defmodule MyApp.Repo.Migrations.Hello do
  @moduledoc false
  use Ecto.Migration
  import Needle.Migration

  def change() do
    create_if_not_exists table(:hello) do
      add :pointer, strong_pointer(), null: false
      add :greeting, :text, null: false
    end
  end
end

If you are pointing to a specific table instead of pointer, strong_pointer/1 allows you to pass the name of that module (strong_pointer/0 calls this with Needle.Pointer).

Dereferencing Needle

It is common that even though you have a universal foreign key, you will want to issue different queries based upon the type that is being pointed to. For this reason, it is up to you to decide how to perform an onward query.

Needle.Pointers.schema/1 turns a Pointer into an Ecto schema module name you can switch against. Needle.Pointers.plan breaks down a list of Needle into a map of ids keyed by schema module. It is handy to define some functions in your (non-library) application that can load any type of pointer in given contexts.

Querying Needle

Since Pointer has a table, you can use its table_id field to filter by pointed type. Needle.Tables.id!/1 (or ids!/1 for a list) can be used to obtain the IDs for a table or tables.

Then you run into another problem, that even though you know all of the tables you're working with will have a certain field, you need to know which table they are to work with them! The solution to this is what we are calling 'mixin tables' for convenience.

A mixin table has a Pointer primary key along with any other fields you wish to store in this mixin. By moving fields out to mixin tables, you gain knowledge of the table name to which you need to join.

An example mixin schema:

defmodule My.Creator do
  use Needle.Mixin,
    otp_app: :my_app,
    source: "creator"

  mixin_schema do
    belongs_to :creator, My.User
  end
end

Mixin tables are not themselves pointable, so there is no need to specify a table id as when defining a pointable schema.

The migration for this is slightly more complicated:

defmodule My.Creator.Migration do

  import Ecto.Migration
  import Needle.Migration

  defp creator_table(), do: My.Creator.__schema__(:source)
  defp user_table(), do: My.User.__schema__(:source)

  def migrate_creator(index_opts \\ []),
    do: migrate_creator(index_opts, direction())

  defp migrate_creator(index_opts, :up) do
    create_mixin_table(creator_table()) do
      add :creator_id, strong_pointer(user_table()), null: false 
    end
    create_if_not_exists(unique_index(creator_table(), [:creator_id], index_opts))
  end

  defp migrate_creator(index_opts, :down) do
    drop_if_exists(unique_index(creator_table(), [:creator_id], index_opts))
    drop_mixin_table(creator_table())
  end
end

Virtual pointables ("virtuals")

Virtuals are a new addition in needle 0.6.0. They behave like pointables that you have not added any fields to.

We noticed it was very common in bonfire to create pointables with no extra fields just so we could use the needle system. Virtuals are alternative for this case that requires less typing and provides a reduced overhead vs pointables.

Virtuals are backed by a writable view onto the needle table. This means that when we can save the cost of maintaining a primary key in that table and the associated disk space.

In all other respects, they behave like pointables. You can have changesets over them and select and insert as usual.

Elixir-based logic

The practical result of needle is that it pushes a certain amount of validation and consistency logic back into elixir land. It is therefore your elixir code's responsibility to ensure that data is inserted into the appropriate mixin tables when inserting a pointable object and to manage deletions as appropriate.

When assembling queries with mixin tables, pay careful attention to the type of join you are performing. An inner join is explicitly asking not to be shown objects that do not have a record for that mixin. You quite possibly wanted to left join.

Configuration and overrides

Every pointable or mixin schema is overrideable with configuration during compilation (this is why using them requires an :otp_app to be specified). For example, we could override Needle.Table (which is a pointable table) thus:

config :needle, Needle.Table, source: "my_pointers_table"

The table_id is also configurable, but we don't recommend you change it.

In addition, all pointable and mixin schemas permit extension with Exto. See the Exto docs for more information about how to extend schemas via configuration. You will probably at the very least want to insert some has_one for mixins off your pointables.

Tradeoffs

All solutions to the universal primary key problem have tradeofs. Here are what we see as the deficiencies in our approach:

  1. It forces a ULID on you. This is great for us, but not everyone. ULID exposes a timestamp with millisecond precision. If the time of creation of a resource is sensitive information for your purposes, ULIDs are not going to be suitable for you.
  2. Ecto has no knowledge of the specialty of Pointer, e.g. Repo.preload does not work and you need to specify a join condition to join through a pointer. Use our functions or add extra associations with exto configuration.
  3. Dereferencing a list of needle requires a select query per table type that occurs in the input set.
  4. Reliance on user attention. You have to follow the instructions correctly to make the system work at all.
  5. There is likely some performance impact from postgres not understanding the relationships between the various tables properly. It's hard to gauge and we haven't even tried.

These are not likely to change. If you're going to pick this library, do so in the full knowledge of the tradeoffs it makes.

Alternatives include (I'm sure you can think of others):

  • Storing the table name in a second column alongside every foreign key.
  • A compound datatype of id and table name.
  • Byte/String manipulation tricks.
  • Evil SQL hacks based upon compile time configuration.

While we have our gripes with this approach, once you've gotten the hang of using it, it works out pretty well for most purposes and it's one of the simpler options to work with.

Copyright and License

Copyright (c) 2020 needle Contributors

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.