neo4jrb/neo4j-ruby-driver

Counter-proposal discussion: more Ruby-like

leehericks opened this issue · 26 comments

An official, core driver near to the Neo4j driver specification sounds great, but the proposal doesn't feel Ruby-like. I also think Rails developers will prefer the abstraction built over the driver (neo4j.rb)

Here is an in-discussion revision to the proposal which keeps all the structure of the driver specification while utilizing what makes Ruby great.

# Java loves factory and builder patterns, ok. Creating the driver:
include Neo4j::Driver
driver = GraphDatabase.driver('bolt://localhost:7687', 
                              AuthTokens.basic('neo4j', 'password'),
                              ...config)

# Ruby is flexible so we can give some different ways to use this.
# First, Neo4j docs say sessions are lightweight, disposable, and usually scoped in a 
# context block when the language supports it, so let's do that. 
# Ruby blocks and instance_eval (with proper care for delegating method calls
# back to the original context via `method_missing`) give us clean DSL with automatically
# closing the session. 

# Default session-level access mode is :write
driver.session do
  run('MATCH (a:Person) RETURN a.name AS name').to_a
end

# Session-level access mode with auto-commit transaction
driver.session :read do
  run('MATCH (a:Person) RETURN a.name AS name').to_a
end

# Transaction-level access mode (multiple transactions per session)
driver.session do
  transaction :read do
    run('MATCH (a:Person) RETURN a.name AS name').to_a 
  end
end

# Note that while the blocks are executed in the scope of the session or transaction,
# the objects can be explicitly used through parameters:
driver.session do |session|
  session.transaction { |tx| create_person_node(tx, "Lee") }
end

def create_person_node(tx, name)
  tx.run 'CREATE (a:Person {name: $name})', name: name
end

# Further, using a method symbol and variable arguments can create a clean, readable solution.
# Default write access mode
driver.session do
  transaction :create_person_node, "Lee"
end

# Bookmarks are sent from the server after each transaction is commit.
# transaction is called on the session object, meaning we can make the last available.
bookmarks = []

driver.session do
  transaction :write { run ... }
  bookmarks << last_bookmark
end

# OR

bookmarks << driver.session do
  transaction :write { run ... }
  transaction :write { run ... }
  transaction :write { run ... }
  last_bookmark
end

# Then let's use them for a next session
driver.session :read, bookmarks do
  run('MATCH (a:Person) RETURN a.name AS name').to_a
end

# So, if I may rewrite the last method in the 3.4 example, line 223:
def add_employ_and_make_friends
  bookmarks = []

  driver.session do
    transaction :add_company, "Wayne Enterprises"
    transaction :add_person "Alice"
    transaction :employ, "Alice", "Wayne Enterprises"
    bookmarks << last_bookmark
  end

  driver.session do
    transaction :add_company, "LexCorp"
    transaction :add_person, "Bob"
    transaction :employ, "Bob", "LexCorp"
    bookmarks << last_bookmark
  end

  driver.session bookmarks do
    transaction :make_friends, "Alice", "Bob"
    transaction :read, :print_friends
  end
end

driver.close

The main logic here is that if a block is given, we will execute it in the context of that session/transaction/result. We will also yield the object to the block, so the developer can add the parameter if they want to pass on the object to a method, which explicitly shows intent, although they could also use self.

We have a fixed domain of driver > session > transaction > result, so state can be accumulated behind the scenes. This leaves us with a clean, flexible api I think.

Just piping in briefly: I think that the idea is to keep the neo4j gem but in a major version update replace the innards to make it work with this new driver.

Just piping in briefly: I think that the idea is to keep the neo4j gem but in a major version update replace the innards to make it work with this new driver.

Thanks Brian, good to know the intention going forward. I still think it would be nice to consider idiomatic Ruby for those not using Rails, as work is pushing forward to do machine learning and AI with Ruby and Neo4j wants to position itself to be used for such things.

👍 Yeah, definitely agreed about have a Ruby style. Though Neo4j, Inc. is standardizing on a common API across all languages, so I think it's important to work within that. I imagine there's room to be ruby-ish in there, though. I haven't seen the API, however. Is there some documentation?

👍 Yeah, definitely agreed about have a Ruby style. Though Neo4j, Inc. is standardizing on a common API across all languages, so I think it's important to work within that. I imagine there's room to be ruby-ish in there, though. I haven't seen the API, however. Is there some documentation?

Top of the readme for this repo there is a link to the chapter in the Neo4j docs. You'll notice that they offer API example for C#, Java, Javascript and Python throughout the chapter and while following a general flow they deviate for language specifics!

And this ruby driver use-case docs here

I'll also add that the proposed api above can be run procedurally like the original proposal, for example:

driver.session do
  transaction :write, &method(:match_person_nodes)
end

# can be explicitly written by not using the block:

def get_people
  session = driver.session
  session.transaction :write, &method(:match_person_nodes)
ensure
  session&.close
end

def match_person_nodes(tx)
  tx.run("MATCH (a:Person) RETURN a.name ORDER BY a.name").map(&:first)
end

It's just that if you are writing a bunch of code with multiple sessions and bookmarks like the example, the concise syntax should be appreciated.

I think the issue with your suggestion (and I really should get back to work, I'll let Heinrich respond) is that you can't just call driver.session every time you want to do something. You might have multiple sessions, so you need to call driver.session to get a session and then use that variable to make queries against that specific session. We had the idea of a "default session" before and it caused problems and was a limitation.

One thing that you probably didn't see is that the neo4j-core gem was developed before Neo4j, Inc standardized on their APIs. The concepts that were used in neo4j-core weren't very compatible with the way that Neo4j was developing their official drivers and it was really hard to shoehorn it in. This is partially because those concepts are tied in with the underlying Bolt protocol. It's super-important that the concepts be preserved for the future maintainability of the driver.

Thanks for sharing the script in the doc directory. I don't know that it seems very un-Rubyish to me, but I've seen Ruby used it a lot of different ways. 🤷‍♂️

Yeah, I probably shouldn't have spent half a work day on this hehe.

But, I went through their driver docs and the driver is instantiated once. They say that sessions are lightweight and disposable, and that sessions and transactions grab connections from the connection pool.

So in fact, they do want you to call driver.session multiple times, because the causal bookmarks feature are basically commit hashes for each transaction change you made to the graph, and that gets shared around the cluster so that when you hand those bookmarks to the next session you instantiate, whatever server in the cluster you end up hitting will be sure that it has synced to the point that it has all the data writes you may have made to a write database in the cluster beforehand.

Thank you @leehericks for your thoughts. I was desperately missing some critical look. I showed the proposal to several people but did not such constructive feedback. I really appreciate especially your ruby implementation of java try-with-resource. I was bothered by the blocks with ensure and your approach solves it beautifully. I think we can support your proposal without giving up any current functionality or compliance with the common API for all the drivers.

Just a clarification. According to your proposal this should not close the session, right?

driver.session.run('MATCH (a:Person) RETURN a.name AS name')

But this should
driver.session { run('MATCH (a:Person) RETURN a.name AS name') }

@leehericks were you thinking about using this pattern https://www.dan-manges.com/blog/ruby-dsls-instance-eval-with-delegation to implement your proposal. I am wonderring if we can still call that idiomatic ruby despite all the surprises like not accessibility of instance variables. First after your proposal and some read up I myself understood some issues in the neo4j.rb gem with instance eval without delegation in the branch method and the necesssary work aorund in this case.

  def accessible_phases(client, limit = nil)
    outer = self
    client
      .phases(:p).not_archived.branch { all_structures.not_archived.visible_by_permissions(outer) }.distinct
      .ordered.limit(limit)
  end

BTW in our case even if branch were implemented using instance_eval and delegation it would not have solved the need for the work around either.

So if someone writes a code like this:

  begin
    session = driver.session
    session.run(query(self))
  ensure
    session&.close
  end

They won't be able to automatically convert it to:
driver.session { run(query(self)) }

However even worse, I think, is the accidental hiding of methods on the original self by the implementation methods on the session object.

Don't get me wrong I still would like to implement your proposal very much, but do you think we are doing something good to the ruby community with all the not easy to understand magic and modifying intuitive behavior?

Maybe a good compromise is to always require the block parameter:
driver.session { |session| session.run(query(self)) }
This could still close the session without the need for modifying the block binding and all the risk associated with it like e.g. accidentally hiding user's method when new Session method is added to the implementation.

Just a clarification. According to your proposal this should not close the session, right?

driver.session.run('MATCH (a:Person) RETURN a.name AS name')

But this should
driver.session { run('MATCH (a:Person) RETURN a.name AS name') }

Yes, this is my mistake as I was dreaming up the API. driver.session.run would not close the session. Here is a working implementation:

@klobuczek this captures the global space as in the article you referred to. Does this still suffer from the workarounds you are referring to? Also...well, maybe too many optional ways to do things are not always in the best interest of the programmer.

class Driver
  def initialize()
  end

  def session(&block)
    session = Session.new

    if block_given?
      if block.arity == 1
        yield session
      else
        session.evaluate &block
      end

      session.close
      session.bookmarks
    else
      session
    end
  end
end

class Session
  attr_reader :bookmarks

  def initialize(access_mode = nil)
    @access_mode = access_mode
    @bookmarks = []
    puts "Session Opened"
  end

  def evaluate(&block)
    @self_before_instance_eval = eval "self", block.binding
    instance_eval &block
  end

  def method_missing(method, *args, &block)
    @self_before_instance_eval.send method, *args, &block
  end

  def transaction(access_mode, &block)
    transaction = Transaction.new(access_mode)

    if block_given?
      if block.arity == 1
        yield transaction
      else
        transaction.evaluate(@self_before_instance_eval, &block)
      end
    else
      # manual transaction
    end
  end

  def run
    puts "Session Run"
  end

  def close()
    puts "Session Closed"
  end
end

class Transaction
  attr_reader :access_mode

  def initialize(access_mode)
    @access_mode = access_mode
  end

  def evaluate(global_self, &block)
    @self_before_session_instance_eval = global_self
    instance_eval &block
  end

  def method_missing(method, *args, &block)
    @self_before_session_instance_eval.send method, *args, &block
  end

  def run(cypher)
    puts "Transaction Run: #{cypher}"
  end
end


driver = Driver.new

# Manual session management
session = driver.session
session.run
session.close

# Session passed as argument
driver.session { |s| s.run }

# Session with transaction
name = "Lee"
driver.session do
  transaction :read do
    run "CREATE (p:Person {name: #{name}})"
  end
end

# Session with transaction using transaction function, returning bookmarks
def create_person(tx, name)
  puts "Create Person"
  tx.run "CREATE (p:Person {name: #{name}})"
end

bookmarks = driver.session do
  transaction :read do |tx|
    create_person(tx, "Lee")
  end
end
puts bookmarks.count

# Checking when session was passed in by argument but transaction was instance_eval'd
def helper
  puts "Helper"
end

driver.session do |session|
  session.transaction :read do
    helper
    run "CREATE (p:Person {name: #{name}})"
  end
  session.transaction(:write) { |tx| create_person(tx, name) }
end

Great, thank you. However I would not do this:

def method_missing(method, *args, &block) 
  @self_before_instance_eval.send method, *args, &block
end

I would just execute the block with the regular binding if the block arrity is 1 or with the session as binding if the arity is 0. That gives all the options and control to the programmer.
The reason is that the delegation via method_missing is very brittle. If we later add a method to the Session class with a name already used by a programmer it will breaks things in client's code. I think this approach is good if you have control over the entire application, but not ideal for libraries.

What do you think?

Thank you for the concrete example. I didn’t think of it. Of course the block could be instance_eval in the context of a SessionProxy object that restricts the visible API. The problem can be engineered around making a clean DSL. Actually, this should probably happen so that the programmer doesn't call methods like close in the block.

No matter which way you choose to program, this implementation would allow every style giving developers a complete level of control at the cost of complexity of options. It’s true sometimes the undocumented and magical nature of Ruby APIs frustrates me.

That said, the driver doesn’t magically appear completed haha. Removing the instance_eval in the future would be trivial. It may be worth developing including the feature and seeing the outcome as testing with real-world examples?

I still haven’t implemented passing a method symbol and variable arguments to the transaction Method. This would optionally cleanup boilerplatecode when using what Neo4j called “transaction functions’. Again, the safe programmer can use the block requesting the transaction by argument and code almost identically to the unified api

@leehericks one more thing I would not fix the return of the session method to bookmarks. This feels a bit hardcoded. I would return the value of the block. That way the programmer can decide what is returned. If you do want bookmarks you will have to do:

driver.session do
  run('MATCH (a:Person) RETURN a.name AS name')
  last_boomark
end

Otherwise, the block would return the statement result which is probably most of the time the desired return and would be consistent with the return of:

session.run('MATCH (a:Person) RETURN a.name AS name')

@klobuczek fair enough about the bookmarks. I've revisited the docs and Java does have lastBookmark, although it's also hardcoding the expectation that you never need earlier bookmarks. Do you know if it is the case that you would never want all the bookmarks?

driver.session do
  run('MATCH (a:Person) RETURN a.name AS name')
  bookmarks.last
end

If not, then I think last_bookmark as you suggest is best.

Within a session, bookmark propagation is carried out automatically and does not require any explicit signal or setting from the application. To opt out of this mechanism for unrelated units of work, applications can use multiple sessions. This avoids the small latency overhead of the causal chain. Propagation between sessions can be achieved by extracting the last bookmarks from one or more sessions and passing these into the construction of another. This is generally the only case in which an application will need to work with bookmarks directly.

Reference: https://neo4j.com/docs/driver-manual/1.7/sessions-transactions/#driver-transactions-causal-chaining

@klobuczek As we discuss I will edit my proposal at the top. Regarding transactions, how do you feel about the following? Again, it's Ruby magic, but it does add optional clarity and the original proposal does have similar syntax.

# A
driver.session :write do
  transaction { |tx| add_company(tx, "Wayne Enterprises") }
  transaction { |tx| add_person(tx, "Alice") }
  transaction { |tx| employ(tx, "Alice", "Wayne Enterprises") }
  saved_bookmarks << last_bookmark
end

# B: example using method symbol to reduce boilerplate code. Too terse?
driver.session :write do
  transaction :add_company, "LexCorp"
  transaction :add_person, "Bob"
  transaction :employ, "Bob", "LexCorp"
  saved_bookmarks << last_bookmark
end

# C: A little more Java driver-like transaction syntax, but inconsistent use with declaring access mode in session.
driver.session :write, saved_bookmarks do
  write_transaction :make_friends, "Alice", "Bob"
  read_transaction :print_friends
end

I'm not clear on all the steps of transactions. I see the Java driver has begin, success, failure, and close for explicit transactions started with session.beginTransaction.

Should a transaction block allow marking failure inside?

@leehericks I do fully support option A with the Java method naming from C. I would like this to be very analogous to the other drivers. All 5 official drivers use the same naming e.g. write_transaction.
I tend to implement the optional binding of the block to the resource however I would not put that version in the official manual examples as it is not straightforward or easy to understand when learning about the driver.

The plain ruby notation for the Option B would be

session.write_transaction(&method(:add_company).curry.call('LexCorp'))

provided the formal parameters of add_company are reversed.
I would prefer to use idiomatic ruby. It is more important to keep it generic and flexible than providing syntactic sugar for special cases. We can use all the fancy DSL in the neo4j.rb gem where we anyway define our own DSL.

I'm not clear on all the steps of transactions. I see the Java driver has begin, success, failure, and close for explicit transactions started with session.beginTransaction.

Yes those would be available in the auto-close block. But please note that only begin_transaction returns a Transaction and thus only this one could have the optional auto-close block. read_transaction and write_transaction methods have a mandatory block. Again this might be some questionable decission in the first driver released (the Java one), but I believe at the driver layer we are much better off sticking to that pattern. It is so reassuring to have the java implementation at hand and at any time be able see to how the java driver works and even adopt their specs.

Please see a new version of the doc https://github.com/neo4jrb/neo4j-ruby-driver/blob/proposal/doc/dev_manual_examples.rb. I thing it will become apparent that your suggestion of using symbols instead of class constants is better when we get to the seabolt based driver. For now it has been convenient to reuse the java classes.
Please note that the jruby version of the driver is nearly complete, only the newer temporal and spacial types are still missing. We are using the driver with neo4j.rb in production on a large application. That's why you may sometimes notice some hesitance to changes but I still do appreciate your suggestions.

@leehericks I've meanwhile implemented the main block pattern. Please have a look and let me know if you think it could be improved: https://github.com/neo4jrb/neo4j-ruby-driver/blob/proposal/jruby/neo4j/driver/ext/auto_closable.rb

@klobuczek This is interesting, I'm learning a lot about modules and the ancestors chain here. prepend is interesting. I'm still curious, regarding instance_eval for blocks with no arguments, don't you think a proxy object of sorts could be instance_eval'd so that only a set of methods, the public api, are available in the block?

@leehericks The problem is not about restricting what clients can call on our block binding but what they can call on the previous block binding via method_missing. We have no control over that namespace and that namespace would be comingled with our namespace (e.g. the proxy object that you suggest). This would mean that we never again will be able to add any more methods to the proxy object as we don't know if such a name has already been used in the client namespace. If we did nevertheless we could possibly hide a method that the client defined in their "previous" binding and expects to be called viamethod_missing. In such a case the client call would unexpectedly be dispatched to our method definition and the client code would fail. So we would not be able to confidently release a new version without running the risk of breaking some client's code. I think we should keep our namespace independent from the client one and not mix the 2.

Well, I think it's generally understood that if you say a session has run, read_transaction, write_transaction, begin_transaction and close methods that using the instance_eval version means the developer can't use those names. They are simply reserved method names within the block.

I thought the bigger issue was that if you call session.instance_eval &block you've given the developer access to private internal methods. This is why I was suggesting session.proxy.instance_eval &block where the proxy method constructs an anonymous class with the public api and global self (for method_missing) and returns an instantiation of it.

The reserved method is something suitable for a language which is fully designed up front. Libraries are something much more volatile. You cannot reserve all names you may want to use in the future. In this particular case we don't even control the namespace, it's up to neo4j and bolt protocol V1 and future V2 and V3, which are partially specified or implemented at the moment.

I appreciate the idea of working on making the API for the driver pleasant to use, but I think it's also important to keep it simple. This is the lowest level at which ruby users will interact with Neo4J.
I think the extra sugar proposed above is cool, but I don't see it really adding to the mission of this package: to provide a strong driver with good performance.

For instance, I don't like the idea of using instance_eval as I see it as greatly increasing the maintenance/compatibility concerns for the driver. The same goes for passing symbols and having the driver call them as methods. Good for some users, but it will be surprising for others and I can imagine some major headaches down the line.

def create_person_node(tx, name)
  tx.run 'CREATE (a:Person {name: $name})', name: name
end

# Further, using a method symbol and variable arguments can create a clean, readable solution.
# Default write access mode
driver.session do
  transaction :create_person_node, "Lee"
end

Is a very small improvement over the following, in my opinion.

driver.session do |s|
   s.write_transaction { |tx| create_person_node tx, "Lee" }
end

I think the ergonomic extras suggested by @leehericks can be implemented in a gem that sits on top of the driver, much like the neo4jrb/neo4j gem does now. The two gems would both be alternative ways to deal with neo4j, built on the foundation of neo4j-ruby-driver.

For the driver itself, I strongly support keeping it as simple as possible, and staying as close to the neo4j reference drivers as possible. I think that is the path that will lead to the best maintainability, and the lowest chance of bugs for the library going forwards.

P.S. Thanks everyone for the work you're doing to make neo4j in ruby awesome!

One more plus for keeping the neo4j ruby driver very similar to other drivers is that it will be easier to port over security/bug fixes from those drivers as well.