oracle/pgql-lang

open cypher?

tbrantb opened this issue · 2 comments

The pgql spec has obvious similarities to the cypher query language. Instead of introducing another query language onto the graph db sector. Would it be possible to add some more weight behind the open cypher initiative? http://www.opencypher.org

Are there technical, legal, political reasons not to do so?

Hi tbrantb,

I can only comment regarding the technical aspect.

There are several open issues with Cypher that PGQL tries to solve. Many of these issues have come forth from the work done over the past year by LDBC's GraphQL work force, which is a group of people from academia and industry (incl. Neo4j and Oracle) that are researching graph query langauges. Note that the work force does not endorse PGQL nor Cypher but gives independent recommendations for graph query language standardization. These recommendations were taken into account while designing PGQL.

Currently, the open issues with Cypher are as follows:

  • The pattern matching semantic is not subgraph homomorphism but a form of subgraph isomorphism that allows two query vertices to bind to the same data vertex (like homomorphism) but does not allow two query edges to bind to the same data edge. This means that certain SQL and SPARQL queries cannot be (naturally) expressed in Cypher. It also makes it more expensive to process queries and it especially blows up processing of recursive-style queries (e.g. -[:knows*]->) as Cypher would even enforce the semantic over the edges along paths or between edges from different conjunctive paths.
  • The (recursive-style) path finding semantic of Cypher is to find all trails (paths with non-repeating edges) between pairs of vertices. In addition, it supports single min-hop shortest path finding and all min-hop shortest path finding. The problem with finding all trails between a pair of vertices is that it has a runtime complexity that is exponential in the size of the graph and can thus easily blow up query processing even for small graphs. This problem does not exists for single shortest path finding. However, the kind of constraints that Cypher allows you to specify as part of a path pattern is very limited. For example, you cannot specify (inside the path pattern) that all the vertices along the path have the label Person or that all the edges along the path have a weight that is smaller than 100. As a result, you end specifying that you want to find all trails and filter out a valid shortest path afterwards, which is very expensive. Sometimes the query optimizer can take care of it and still process it efficiently, but this is generally very hard.
    Given that use cases typically don't require support for all trail finding, it is better that the query language supports reachability between pairs of vertices (like SPARQL) in combination with shortest path finding and at the same time allows for general filter expressions in path patterns that can be evaluated (efficiently) during traversal.
    Recursive-style path finding is the functionality that distinguishes graph query languages from SQL, so it is important to get it right.
  • Cypher only supports a subset of the class of Regular Path Queries (RPQs). This limits the user but also makes it hard to reason about the language from a theoretical perspective.
  • Cypher does not support query composition since subqueries are specified by chaining multiple queries such that the first queries have the WITH clause at the end while the last query has the RETURN clause at the end.

Pattern matching syntax in PGQL is indeed similar to Cypher ("ASCII-Art"). We found this a very natural way to specify patterns in graphs. However, we have tried to keep the rest of the syntax as close as possible to SQL because we believe this makes a lot of sense and is more appealing for existing SQL and SPARQL users.

  • PGQL has SELECT ... (FROM ...) WHERE ... while Cypher has MATCH ... WHERE ... WITH/RETURN
  • PGQL has separate SELECT and GROUP BY clauses while Cypher merges these two clauses into a single WITH/RETURN clause
  • PGQL has a HAVING clause while Cypher requires a subquery to filter out groups of solutions

Also, in the short term we plan to add additional functionality to PGQL that is currently not present in Cypher:

  • Graph construction and composition of queries that return graphs
  • Weighted shortest path finding
  • Multiple graphs as input to a query

Thank you for your very informative response. This leads me to several
things I'd like to research/learn.

On Jul 28, 2016 7:26 PM, "Oskar van Rest" notifications@github.com wrote:

Hi tbrantb,

I can only comment regarding the technical aspect.

There are several open issues with Cypher that PGQL tries to solve. Many
of these issues have come forth from the work done over the past year by
LDBC http://ldbcouncil.org/'s GraphQL work force, which is a group of
people from academia and industry (incl. Neo4j and Oracle) that are
researching graph query langauges. Note that the work force does not
endorse PGQL nor Cypher but gives independent recommendations for graph
query language standardization. These recommendations were taken into
account while designing PGQL.

Currently, the open issues with Cypher are as follows:

The pattern matching semantic is not subgraph homomorphism
https://en.wikipedia.org/wiki/Graph_homomorphism but a form of
subgraph isomorphism
https://en.wikipedia.org/wiki/Subgraph_isomorphism_problem that
allows two query vertices to bind to the same data vertex (like
homomorphism) but does not allow two query edges to bind to the same data
edge. This means that certain SQL and SPARQL queries cannot be (naturally)
expressed in Cypher. It also makes it more expensive to process queries and
it especially blows up processing of recursive-style queries (e.g.
-[:knows*]->) as Cypher would even enforce the semantic over the edges
along paths or between edges from different conjunctive paths.

The (recursive-style) path finding semantic of Cypher is to find all
trails
(paths with non-repeating edges) between pairs of vertices. In
addition, it supports single min-hop shortest path finding and all
min-hop shortest path finding
. The problem with finding all trails
between a pair of vertices is that it has a runtime complexity that is
exponential in the size of the graph and can thus easily blow up query
processing even for small graphs. This problem does not exists for single
shortest path finding
. However, the kind of constraints that Cypher
allows you to specify as part of a path pattern is very limited. For
example, you cannot specify (inside the path pattern) that all the vertices
along the path have the label Person or that all the edges along the
path have a weight that is smaller than 100. As a result, you end
specifying that you want to find all trails and filter out a valid
shortest path afterwards, which is very expensive. Sometimes the query
optimizer can take care of it and still process it efficiently, but this is
generally very hard.
Given that use cases typically don't require support for all trail
finding
, it is better that the query language supports reachability
between pairs of vertices (like SPARQL) in combination with shortest path
finding and at the same time allows for general filter expressions in path
patterns that can be evaluated (efficiently) during traversal.
Recursive-style path finding is the functionality that distinguishes
graph query languages from SQL, so it is important to get it right.

Cypher only supports a subset of the class of Regular Path Queries
(RPQs). This limits the user but also makes it hard to reason about the
language from a theoretical perspective.

Cypher does not support query composition since subqueries are
specified by chaining multiple queries such that the first queries have the
WITH clause at the end while the last query has the RETURN clause at
the end.

Pattern matching syntax in PGQL is indeed similar to Cypher ("ASCII-Art").
We found this a very natural way to specify patterns in graphs. However, we
have tried to keep the rest of the syntax as close as possible to SQL
because we believe this makes a lot of sense and is more appealing for
existing SQL and SPARQL users.

  • PGQL has SELECT ... (FROM ...) WHERE ... while Cypher has MATCH ...
    WHERE ... WITH/RETURN
  • PGQL has separate SELECT and GROUP BY clauses while Cypher merges
    these two clauses into a single WITH/RETURN clause
  • PGQL has a HAVING clause while Cypher requires a subquery to filter
    out groups of solutions

Also, in the short term we plan to add additional functionality to PGQL
that is currently not present in Cypher:

  • Graph construction and composition of queries that return graphs
  • Weighted shortest path finding
  • Multiple graphs as input to a query


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#1 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AATQfCMwWm92XT6RLbp6UQhwqQfmTkWYks5qaTqSgaJpZM4JWt8t
.