/snowflake

A distributed Snowflake generator in Elixir.

Primary LanguageElixirMIT LicenseMIT

Snowflake

A scalable, decentralized Snowflake generator in Elixir.

Usage

In your mix.exs file:

def deps do
  [{:snowflake, "~> 1.0.0"}]
end
def application do
  [applications: [:snowflake]]
end

Specify the nodes in your config. If you're running a cluster, specify all the nodes in the cluster that snowflake runs on.

  • nodes can be Erlang Node Names, Public IPs, Private IPs, Hostnames, or FQDNs
  • epoch should not be changed once you begin generating IDs and want to maintain sorting
  • There should be no more than 1 snowflake generator per node, or you risk potential duplicate snowflakes on the same node.
config :snowflake,
  nodes: ["127.0.0.1", :'nonode@nohost'],   # up to 1023 nodes
  epoch: 1142974214000  # don't change after you decide what your epoch is

Alternatively, you can specify a specific machine_id

config :snowflake,
  machine_id: 23,   # values are 0 thru 1023 nodes
  epoch: 1142974214000  # don't change after you decide what your epoch is

Generating an ID is simple.

Snowflake.next_id()
# => {:ok, 54974240033603584}

Util functions

After generating snowflake IDs, you may want to use them to do other things. For example, deriving a bucket number from a snowflake to use as part of a composite key in Cassandra in the attempt to limit partition size.

Lets say we want to know the current bucket for an ID that would be generated right now:

Snowflake.Util.bucket(30, :days)
# => 5

Or if we want to know which bucket a snowflake ID should belong to, given we are bucketing by every 30 days.

Snowflake.Util.bucket(30, :days, 54974240033603584)
# => 5

Or if we want to know how many ms elapsed from epoch

Snowflake.Util.timestamp_of_id(54974240033603584)
# => 197588482172

Or if we want to know how many ms elapsed from computer epoch (January 1, 1970 midnight). We can use this to derive an actual calendar date.

Snowflake.Util.real_timestamp_of_id(54974240033603584)
# => 1486669389497

NTP

Keep your nodes in sync with ntpd or use your VM equivalent as snowflake depends on OS time. ntpd's job is to slow down or speed up the clock so that it syncs os time with your network time.

Architecture

Snowflake allows the user to specify the nodes in the cluster, each representing a machine. Snowflake at startup inspects itself for Node, IP and Host information and derives its machine_id from the location of itself in the list of nodes defined in the config.

Machine ID is defaulted to 1023 if snowflake is not able to find itself within the specified config. It is important to specify the correct IPs / Hostnames / FQDNs for the nodes in a production environment to avoid any chance of snowflake collision.

Benchmarks

Consistently generates over 60,000 snowflakes per second on Macbook Pro 2.5 GHz Intel Core i7 w/ 16 GB RAM.

Benchmarking snowflake...
Benchmarking snowflakex...

Name                 ips        average  deviation         median
snowflake       316.51 K        3.16 μs   ±503.52%        3.00 μs
snowflakex      296.26 K        3.38 μs   ±514.60%        3.00 μs

Comparison:
snowflake       316.51 K
snowflakex      296.26 K - 1.07x slower