beardedeagle/mnesiac

Inspect remote Mnesia tables before performing copy

beardedeagle opened this issue · 8 comments

In a clustered setting it may be desirable to inspect the remote Mnesia nodes before performing a copy. The reason is that the local node may have data and the remote node the copy is performed against may not. In this case, the local data would be lost.

Leaving this issue here as an idea and to see if others are interested.

I might have a look at this since it would be an interesting challenge to solve, and it might mean having a peek under the hood at how mnesia does its things.

So, I've had some digging around on this and I have some thoughts.

We can interrogate each of the tables for the local node and the remote node we're connecting to, and for each table get its cookie using :mnesia.table_info(table, :cookie), e.g:

  def get_table_cookies(node \\ Node.self()) do
    :rpc.call(node, :mnesia, :system_info, [:tables])
    |> Enum.reduce(%{}, fn t, acc ->
      Map.put(acc, t, :rpc.call(node, :mnesia, :table_info, [t, :cookie]))
    end)
  end

This function produces a map of tables and their cookies like this:

%{
  ExampleStore => {{1547980402798949900, -576460752303422334, 1}, :a@127.0.0.1},
  :schema => {{1547980402132387900, -576460752303422526, 1}, :a@127.0.0.1}
}

Now, my understanding is that for every table:

  • if the table exists on the remote node but not the local
    ok to connect, we will want to add_table_copy for this.

  • if the table exists on the local, but not remote
    if it's one of our mnesiac stores then we could add_table_copy in the other direction, otherwise we've found some data we don't know about - not sure how best we should deal with this

  • if the table exists on both but the cookies match
    We can connect, and mnesia will try to bring the two data sets in sync. In this situation, mnesia can get into a jam where it detects that there's been a partition. Not sure how best to handle this other than aborting and informing the user.

  • if the table exists on both and the cookies don't match
    we can't connect without choosing which node's table is canonical, since both nodes claim to be the table's creator.

We might be able to use :mnesia.system_info(:transaction_commits) to break the deadlock based on how many transactions have been committed on one node vs another and drop the table that has the lower number in the case of the last two situations, but I'm not sure if that's a good idea or not, though.

(edited my commant above with some different info)

I've thought some more on this and given that :mnesia.system_info(:transaction_commits) is related to the whole of mnesia on a given node rather than a single table, we can't use that data to decide which table is more canonical than the other, since all of the transactions could be on an unrelated table.

Right, I'm thinking we may be straying into amalgamate/unspilt land here as well because you are potentially talking about "what do we do when a network partition is healed and Mnesia reconnects up". Something to consider while thinking about this.

I agree there - we still could handle the first two scenarios I think since the data doesn't exist on one side or the other, perhaps?

Yeah we should still handle a default case, or make it configurable with a default case. They could either choose to copy or not in their config, by table. That also makes it a bit easier to open Mnesiac up for something like feeding in an amalgamate/unspilt module.

I'll have a think about this - at the moment we ask the user to implement their own init and copy functions, we could have them as 'overrides' instead to the default behaviour under-the-hood, e.g:

 @doc """
  Copy tables
  """
  def copy_tables do
    Enum.each(stores(), fn data_mapper ->
      if function_exported?(data_mapper, copy_table, 0) do
         apply(data_mapper, :copy_store, [])
      else
         // Default here.
      end
    end)

    :ok
  end

That way if a user wants to get creative or use another library they can still implement their own functionality. (or indeed, override the default and have it do nothing)

edit: Or, you know, I could use defoverridable and a __USING__ macro, which I have discovered exists :)

Ok I think I have enough info together to have a crack at writing up a solution to this, hopefully in such a way that it'll be a bit easier to add optional callbacks for injecting custom behaviour in some situations.

Will get back to you with a PR hopefully by the end of the week.