/related

Related is a Redis-backed high performance distributed graph database.

Primary LanguageRubyMIT LicenseMIT

Related

Related is a Redis-backed high performance distributed graph database.

Raison d'être

Related is meant to be a simple graph database that is fun, free and easy to use. The intention is not to compete with "real" graph databases like Neo4j, but rather to be a replacement for a relational database when your data is better described as a graph. For example when building social software. Related is very similar in scope and functionality to Twitters FlockDB, but is among other things designed to be easier to setup and use. Related also has better documentation and is easier to hack on. The intention is to be web scale, but we ultimately rely on the ability of Redis to scale (using Redis Cluster for example). Read more about the philosophy behind Related in the Wiki.

Setup

Assuming you already have Redis installed:

$ gem install related

Or add the gem to your Gemfile.

require 'related'
Related.redis = 'redis://.../'

If you are using Rails, add the above to an initializer. If Redis is running on localhost and on the default port the second line is not needed.

Example usage

node = Related::Node.create(:name => 'Example', :popularity => 2.3)
node.new?
node.popularity = 100
node.attributes
node.has_attribute?(:popularity)
node.read_attribute(:popularity)
node.write_attribute(:popularity, 50)
node.increment!(:popularity, 10)
node.decrement!(:popularity, 10)
Related::Node.increment!(node, :popularity, 10)
Related::Node.decrement!(node, :popularity, 10)
node.save
node.persisted?
node = Related::Node.find(node.id)
node.destroy
node.destroyed?

node1 = Related::Node.create
node2 = Related::Node.create
rel = Related::Relationship.create(:friends, node1, node2, :have_met => true)

n = Related::Node.find(node1.id)
nn = Related::Node.find(node1.id, node2.id)

n = Related::Node.find(node1.id, :fields => [:name])
nn = Related::Node.find(node1.id, node2.id, :fields => [:name])

Nodes and relationships are both sub-classes of the same base class and both behave similar to an ActiveRecord object and can store attributes etc.

To query the graph:

node.outgoing(:friends)
node.incoming(:friends)
node.outgoing(:friends).relationships
node.outgoing(:friends).nodes
node.outgoing(:friends).limit(5)
node.outgoing(:friends).options(:fields => ..., :model => ...)
node1.outgoing(:friends).relationships.find(node2)
node1.path_to(node2).outgoing(:friends).depth(3)
node1.shortest_path_to(node2).outgoing(:friends).depth(3)

To get the results from a query:

node.outgoing(:friends).to_a
node.outgoing(:friends).count (or .size, which is memoized)

You can also do set operations, like union, diff and intersect:

node1.outgoing(:friends).union(node2.outgoing(:friends))
node1.outgoing(:friends).diff(node2.outgoing(:friends))
node1.outgoing(:friends).intersect(node2.outgoing(:friends))

Relationships are sorted based on when they were created, which means you can paginate them:

node.outgoing(:friends).relationships.per_page(100).page(1)
node.outgoing(:friends).relationships.per_page(100).page(rel)

The second form paginates based on the id of the last relationship on the previous page. Useful for cases where explicit page numbers are not appropriate.

Pagination only works for relationships. If you want to access nodes directly without going through the extra step of iterating through the relationship objects you will only get random nodes. Thus you can use .limit (or .per_page) like this to get a random selection of nodes:

node.outgoing(:friends).nodes.limit(5)

The root node

Related provides a special kind of node called the "root" node. It's always accessible using the Related.root helper and you can create a relationship between any node and the root node, which is useful if you want to easily access a set of nodes without knowing the IDs of those nodes.

Related::Relationship.create(:example, Related.root, node)
Related.root.outgoing(:example)

You can even add attributes to the root node if you want.

Related.root.name = 'The root'
Related.root.save

Properties

All Node and Relationship attributes are stored as strings in Redis, but you can easily create your own subclass and define your own custom serialization behavior. You can either just override the getter and setter methods for the attribute you need to convert or you can use the property method to define the semantics and let Related do the conversion for you.

class Event < Related::Node
  property :title, String
  property :attending_count, Integer
  property :popularity, Float
  property :start_date, DateTime
  property :location do |value|
    "http://maps.google.com/maps?q=#{value}"
  end
end

An additional benefit of defining properties like this is that they get included when you serialize the object to JSON or XML even when the attribute hasn't been set.

event = Event.create(:title => 'Party!', :location => 'Stockholm')
event.as_json # => {"title"=>"Party!","attending_count"=>nil,"popularity"=>nil,"start_date"=>nil,"location"=>"http://maps.google.com/maps?q=Stockholm"}

When querying the graph you may want the query to return the results as your custom model class instead of as a Related::Node or Related::Relationship. Related allows you to specify what model a specific node or relationship should be instantiated as based on its attributes.

Related::Node.find(...,
  :model => lambda {|attributes|
    attributes['start_date'] ? Event : Related::Node
  }
)

node.outgoing(:attending).options(
  :model => lambda {|attributes|
    attributes['start_date'] ? Event : Related::Node
  }
)

Weight

All relationships have an associated weight on its incoming and outgoing links. By default the weight is set to the time when the relationship was created. That makes the result from a query that fetches relationships always sorted so that newer relationships appear first, which is nice. If you create a custom Related::Relationship sub-class you can define how the weight is generated for a relationship.

class Comment < Related::Relationship
  property :created_at, Time
  property :points, Integer
  weight do |direction|
    if direction == :in
      self.created_at
    elsif direction == :out
      self.points
    end
  end
end

The weight is always a double precision floating point number and is sorted in descending order.

To change the weight an existing relationship you can use the increment_weight! and decrement_weight! methods. They are atomic, which means that you can have any number of clients updating the weight simultaneously without conflict.

comment.increment_weight!(:out, 4.2)
comment.decrement_weight!(:in, 4.2)

You can access the current weight and rank (0 based position) of a relationship like this:

comment.weight(:out)
comment.rank(:in)

ActiveModel

Related supports ActiveModel and includes some basic functionality in both nodes and relationships like validations, callbacks, JSON and XML serialization and translation support. You can easily extend your own sub classes with the custom ActiveModel functionality that you need.

class Like < Related::Relationship
  validates_presence_of :how_much
  validates_numericality_of :how_much

  after_save :invalidate_cache

  def invalidate_cache
    ...
  end
end

Follower

Related includes a helper module called Related::Follower that you can include in your node sub-class to get basic Twitter-like follow functionality:

require 'related/follower'

class User < Related::Node
  include Related::Follower
end

user1 = User.create
user2 = User.create

user1.follow!(user2)
user1.unfollow!(user2)
user2.followers
user1.following
user1.friends
user2.followed_by?(user1)
user1.following?(user2)
user2.followers_count
user1.following_count

The two nodes does not need to be of the same type. You can for example have a User following a Page or whatever makes sense in your app.

Real-time Stream Processing

When working with graphs you often want to take the rich and interconnected web of data and actually do something with it. Stream processing is a powerful and flexible way to do that. It allows you to implement complex graph algorithms in a scalable way that is also easy to understand and work with.

Stream processing in Related works by defining a data flow that new or existing data will be streamed through. A data flow is triggered when a Relationship is created, updated or deleted. You setup data flows for different relationship types, so for example when a "friend" relationship between two nodes is created or updated that relationship will be automatically sent through the data flows you have defined for the "friend" type.

A data flow can consist of one or more steps and can branch out in a tree. You define the steps for a data flow using a simple Hash syntax.

Related.data_flow :comment, Tokenize => { CountWords => { TotalSum => nil, MovingAverage => nil } }

In the example above a new comment will first sent to the Tokenize step that will split the comment text into words. The list of words will then automatically be sent to the CountWords step that will count the number of unique words. That number will then be sent to both the TotalSum step that adds the number to a global counter as well as the MovingAverage step that will calculate and store a moving average. The nil indicates the end of the data flow. You can define as many data flows for a relationship type as you want.

A data flow step is simply a Ruby class that responds to the process message and takes a single argument that holds the input data. Any data yielded from the process method will be automatically sent to the next step in the data flow. The only limitation is that the data sent between steps is a Hash and only contains JSON serializable data. The first step in the data flow will receive the Relationship object that triggered it as a Ruby hash with all of its attributes.

class Tokenize
  def self.process(data)
    data['text'].split(' ').each do |word|
      yield({ :word => word })
    end
  end
end

To actually run the data flows you have defined you need to start one or more data flow workers. Related uses Resque which supplies persistent queues and reliable workers. If you don't have Resque required in your application Related will simply run the work flow directly in process instead which can be useful when testing, but is not recommended for production.

To start a stream processor:

$ QUEUE=related rake resque:work

You can start as many stream processors as you may need to scale up.

Distributed cluster setup

It is easy to use Related in a distributed cluster setup. As of writing this (November 2011) Redis Cluster is not yet ready for production use, but is expected for Redis 3.0 sometime in 2012. Redis Cluster will then be the preferred solution as it will allow you to setup up a dynamic cluster that can re-configure on the fly. If you don't need to add or remove machines for the cluster you can still use Related in a distributed setup right now using the consistent hashing client Redis::Distributed which is included in the "redis" gem.

Related.redis = Redis::Distributed.new %w[
  redis://redis-1.example.com
  redis://redis-2.example.com
  redis://redis-3.example.com
  redis://redis-4.example.com],
  :tag => /^related:([^:]+)/

The regular expression supplied in the :tag option tells Redis::Distributed how to distribute keys between the different machines. The regexp in the example is the recommended way of setting it up as it will partition the key space based on the Related ID part of the key, in effect localizing all data directly related to a specific node on a single machine. This is generally good both for reliability (if a machine goes down, it only takes down a part of the graph) and speed (set operations on relationships originating from the same node can be done on the server side, which is a lot faster, for example).

You could also specify a regexp like /:(n|r):/ that will locate all relationships on the same machine, making set operations on relationships a lot faster overall. But with the obvious drawback that the total size of your graph will be limited by that single machine.

Using Related with another database

Related can easily be used together with other databases than Redis to store Node data. Relationships are always stored in Redis, but node data can often have characteristics that make Redis unsuitable (like large size).

You can for example use Related together with the Ripple gem to store nodes in Riak:

class CustomNode
  include Ripple::Document
  include Related::Node::QueryMethods

  def query
    Related::Node::Query.new(self)
  end
end

You can then use the CustomNode class as an ordinary Related graph Node and query the graph like usual:

node1 = CustomNode.create
node2 = CustomNode.create
Related::Relationship.create(:friend, node1, node2)
node1.shortest_path_to(node2).outgoing(:friend)

Development

If you want to make your own changes to Related, first clone the repo and run the tests:

git clone git://github.com/sutajio/related.git
cd related
rake test

Remember to install the Redis server on your local machine.

Contributing

Once you've made your great commits:

  1. Fork Related
  2. Create a topic branch - git checkout -b my_branch
  3. Push to your branch - git push origin my_branch
  4. Create a Pull Request from your branch
  5. That's it!

Author

Related was created by Niklas Holmgren (niklas@sutajio.se) and released under the MIT license.