/neo4j-apoc-procedures

Awesome procedures for Neo4j 3.0 (dubbed `apoc`)

Primary LanguageJavaApache License 2.0Apache-2.0

Collection of useful Procedures for Neo4j 3.x

Build & install apoc procedures

git clone http://github.com/jexp/neo4j-apoc-procedures
cd neo4j-apoc-procedures
mvn clean install
cp target/apoc-1.0.0-SNAPSHOT.jar $NEO4J_HOME/plugins/
$NEO4J_HOME/bin/neo4j restart

If you want to run embedded or use shell on a disk store, configure your plugins directory in conf/neo4j.conf with dbms.plugin.directory=path/to/plugins.

Calling Procedures within Cypher

This repository uses a recent build (>= 3.0.0) of Neo4j, so that it can leverage procedures being called within Cypher statements.

CALL apoc.load.json('http://example.com/map.json') YIELD value as person
MERGE (p:Person {name:person.name})
ON CREATE SET p.age = person.age, p.children = size(person.children)

Included Procedures

Built in Help

  • call apoc.help('search') lists name, description-text and if the procedure performs writes (descriptions are WIP), search string is checked against beginning (package) or end (name) of procedure

helpful
CALL apoc.help("apoc") YIELD name, text
WITH * WHERE text IS null
RETURN name AS undocumented

Manual Indexes

Index Queries

Procedures to add to and query manual indexes

  • apoc.index.addNode(node,['prop1',…​]) add node to an index for each label it has

  • apoc.index.addNodeByLabel(node,'Label',['prop1',…​]) add node to an index for the given label

  • apoc.index.addRelationship(rel,['prop1',…​]) add relationship to an index for its type

  • apoc.index.nodes('Label','prop:value*') YIELD node, weight lucene query on node index with the given label name

  • apoc.index.relationships('TYPE','prop:value*') YIELD rel, weight lucene query on relationship index with the given type name

  • apoc.index.between(node1,'TYPE',node2,'prop:value*') YIELD rel, weight lucene query on relationship index with the given type name bound by either or both sides (each node parameter can be null)

  • apoc.index.out(node,'TYPE','prop:value*') YIELD node, weight lucene query on relationship index with the given type name for outgoing relationship of the given node, returns end-nodes

  • apoc.index.in(node,'TYPE','prop:value*') YIELD node, weight lucene query on relationship index with the given type name for incoming relationship of the given node, returns start-nodes

Index Management

  • CALL apoc.index.list() - YIELD type,name,config - lists all manual indexes

  • CALL apoc.index.remove('name') YIELD type,name,config - removes manual indexes

  • CALL apoc.index.forNodes('name',{config}) YIELD type,name,config - gets or creates manual node index

  • CALL apoc.index.forRelationships('name',{config}) YIELD type,name,config - gets or creates manual relationship index

match (p:Person) call apoc.index.addNode(p,["name","age"]) RETURN count(*);
// 129s for 1M People
call apoc.index.nodes('Person','name:name100*') YIELD node, weight return * limit 2

Meta Graph

Returns a virtual graph that represents the labels and relationship-types available in your database and how they are connected.

  • CALL apoc.meta.graph - examines the full graph to create the meta-graph

  • CALL apoc.meta.graphSample(sampleSize) - examines a sample graph to create the meta-graph, default sampleSize is 100

  • CALL apoc.meta.data - examines a subset of the graph to provide a tabular meta information

  • CALL apoc.meta.type(value) - type name of a value (INTEGER,FLOAT,STRING,BOOLEAN,RELATIONSHIP,NODE,PATH,NULL,UNKNOWN,MAP,LIST)

  • CALL apoc.meta.isType(value,type) - returns a row if type name matches none if not

MATCH (n:Person)
CALL apoc.meta.isType(n.age,"INTEGER")
RETURN n LIMIT 5

Locking

  • call apoc.lock.nodes([nodes]) acquires a write lock on the given nodes

  • call apoc.lock.rels([relationships]) acquires a write lock on the given relationship

  • call apoc.lock.all([nodes],[relationships]) acquires a write lock on the given nodes and relationships

from/toJson

  • CALL apoc.convert.toJson([1,2,3])

  • CALL apoc.convert.toJson({a:42,b:\"foo\",c:[1,2,3]})

  • CALL apoc.convert.fromJsonList('[1,2,3]')

  • CALL apoc.convert.fromJsonMap('{\"a\":42,\"b\":\"foo\",\"c\":[1,2,3]}')

Loading Data from RDBMS

  • CALL apoc.load.jdbc('jdbc:derby:derbyDB','PERSON') YIELD row CREATE (:Person {name:row.name}) load from relational database, either a full table or a sql statement

  • CALL apoc.load.jdbc('jdbc:derby:derbyDB','SELECT * FROM PERSON WHERE AGE > 18') load from relational database, either a full table or a sql statement

  • CALL apoc.load.driver('org.apache.derby.jdbc.EmbeddedDriver') register JDBC driver of source database

Loading Data from Web-APIs (JSON, XML, CSV)

  • CALL apoc.load.json('http://example.com/map.json') YIELD value as person CREATE (p:Person) SET p = person load from JSON URL (e.g. web-api) to import JSON as stream of values if the JSON was an array or a single value if it was a map

  • CALL apoc.load.xml('http://example.com/test.xml') YIELD value as doc CREATE (p:Person) SET p.name = doc.name load from XML URL (e.g. web-api) to import XML as single nested map with attributes and _type, _text and `_children`x fields.

  • CALL apoc.load.csv('url',{sep:";"}) YIELD lineNo, list, map - load CSV fom URL as stream of values

    • config contains any of: {skip:1,limit:5,header:false,sep:'TAB',ignore:['tmp'],arraySep:';',mapping:{years:{type:'int',arraySep:'-',array:false,name:'age',ignore:false}}

Creating Data

  • CALL apoc.create.node(['Label'], {key:value,…​}) create node with dynamic labels

  • CALL apoc.create.nodes(['Label'], [{key:value,…​}]) create multiple nodes with dynamic labels

  • CALL apoc.create.addLabels( [node,id,ids,nodes], ['Label',…​]) - adds the given labels to the node or nodes

  • CALL apoc.create.removeLabels( [node,id,ids,nodes], ['Label',…​]) - removes the given labels from the node or nodes

  • CALL apoc.create.relationship(person1,'KNOWS',{key:value,…​}, person2) create relationship with dynamic rel-type

  • CALL apoc.create.uuid YIELD uuid - creates an UUID

  • CALL apoc.create.uuids(count) YIELD uuid - creates count UUIDs

Virtual Nodes/Rels

Virtual Nodes and Relationships don’t exist in the graph, they are only returned to the UI/user for representing a graph projection. They can be visualized or processed otherwise. Please note that they have negative id’s.

  • CALL apoc.create.vNode(['Label'], {key:value,…​}) returns a virtual node

  • CALL apoc.create.vNodes(['Label'], [{key:value,…​}]) returns virtual nodes

  • CALL apoc.create.vRelationship(nodeFrom,'KNOWS',{key:value,…​}, nodeTo) returns a virtual relationship

  • CALL apoc.create.vPattern({_labels:['LabelA'],key:value},'KNOWS',{key:value,…​}, {_labels:['LabelB'],key:value}) returns a virtual pattern

  • CALL apoc.create.vPatternFull(['LabelA'],{key:value},'KNOWS',{key:value,…​},['LabelB'],{key:value}) returns a virtual pattern

  • TODO `CALL apoc.create.vGraph([nodes, {_labels:[],…​ prop:value,…​}], [rels,{_from:keyValueFrom,_to:{_label:,_key:,_value:value}, _type:'KNOWS', prop:value,…​}],['pk1','Label2:pk2'])

Example

MATCH (a)-[r]->(b)
WITH head(labels(a)) AS l, head(labels(b)) AS l2, type(r) AS rel_type, count(*) as count
CALL apoc.create.vNode(['Meta_Node'],{name:l}) yield node as a
CALL apoc.create.vNode(['Meta_Node'],{name:l2}) yield node as b
CALL apoc.create.vRelationship(a,'META_RELATIONSHIP',{name:rel_type, count:count},b) yield rel
RETURN *;

Monitoring (thanks @ikwattro)

  • apoc.monitor.ids - node and relationships-ids in total and in use

  • apoc.monitor.kernel - store information such as kernel version, start time, read-only, database-name, store-log-version etc.

  • apoc.monitor.store - store size information for the different types of stores

  • apoc.monitor.tx - number of transactions total,opened,committed,concurrent,rolled-back,last-tx-id

Job Management

  • CALL apoc.periodic.commit(statement, params) - repeats an batch update statement until it returns 0, this procedure is blocking

  • CALL apoc.periodic.list() - list all jobs

  • CALL apoc.periodic.submit('name',statement) - submit a one-off background statement

  • CALL apoc.periodic.schedule('name',statement,repeat-time-in-seconds) - submit a repeatedly-called background statement

  • CALL apoc.periodic.countdown('name',statement,delay-in-seconds) - submit a repeatedly-called background statement until it returns 0

  • there are also static methods Jobs.submit, and Jobs.schedule to be used from other procedures

  • jobs list is checked / cleared every 10s for finished jobs

  • CALL apoc.periodic.rock_n_roll(statementIteration, statementAction, batchSize) YIELD batches, total - iterate over first statement and apply action statement with given transaction batch size. Returns to numeric values holding the number of batches and the number of total processed rows. E.g.

CALL apoc.periodic.rock_n_roll('match (p:Person) return p', 'MATCH (p) where p={p} SET p.lastname =p.name', 20000)

copies over the name property of each person to lastname.

Graph Refactoring

  • call apoc.refactor.cloneNodes([node1,node2,…​]) clone nodes with their labels and properties

  • call apoc.refactor.cloneNodesWithRelationships([node1,node2,…​]) clone nodes with their labels, properties and relationships

  • call apoc.refactor.mergeNodes([node1,node2]) merge nodes onto first in list

  • call apoc.refactor.to(rel, endNode) redirect relationship to use new end-node

  • call apoc.refactor.from(rel, startNode) redirect relationship to use new start-node

  • call apoc.refactor.setType(rel, 'NEW-TYPE') change relationship-type

  • merge nodes by label + property

  • merge relationships

  • call apoc.refactor.extractNode([rel1,rel2,…​], [labels], 'OUT','IN') extract node from relationships

  • √ `call apoc.refactor.collapseNode([node1,node2],'TYPE') ` - collapse node to relationship, node with one rel becomes self-relationship

Helpers

  • apoc.map.fromPairs([[key,value],[key2,value2],…​])

  • apoc.map.fromLists([keys],[values])

  • apoc.map.fromValues([key,value,key1,value1])

  • apoc.map.setKey(map,key,value)

  • apoc.coll.sum([0.5,1,2.3])

  • apoc.coll.min([0.5,1,2.3])

  • apoc.coll.max([0.5,1,2.3])

  • apoc.coll.sumLongs([1,3,3])

  • apoc.coll.partition(list,batchSize)

  • apoc.coll.zip([list1],[list2])

  • apoc.coll.pairs([list]) returns `[first,second],[second,third], …​

  • apoc.coll.toSet([list]) returns a unique list backed by a set

  • apoc.coll.sort(coll) sort on Collections

  • apoc.coll.sortNodes([nodes], 'name') sort nodes by property

  • apoc.coll.contains(coll, value) optimized contains operation (using a HashSet) (returns single row or not)

  • apoc.coll.containsAll(coll, values) optimized contains-all operation (using a HashSet) (returns single row or not)

  • apoc.coll.containsSorted(coll, value) optimized contains on a sorted list operation (Collections.binarySearch) (returns single row or not)

  • apoc.coll.containsAllSorted(coll, value) optimized contains-all on a sorted list operation (Collections.binarySearch) (returns single row or not)

  • apoc.get.nodes(node|id|[ids]) yield node quickly returns all nodes with these id’s

  • apoc.get.rels(rels|id|[ids]) yield rel quickly returns all relationships with these id’s

Date/time Support (thanks @tkroman)

Conversion between formatted dates and timestamps

  • apoc.date.toSeconds('2015-03-25 03:15:59') get Unix time equivalent of given date (in seconds)

  • apoc.date.toSecondsFormatted('2015/03/25 03-15-59', 'yyyy/MM/dd HH/mm/ss') same as previous, but accepts custom datetime format

  • apoc.date.fromSeconds(12345) get string representation of date corresponding to given Unix time (in seconds)

  • apoc.date.fromSecondsFormatted(12345, 'yyyy/MM/dd HH/mm/ss') the same as previous, but accepts custom datetime format

  • apoc.date.toMillis('2015-03-25 03:15:59') get Unix time equivalent of given date (in milliseconds)

  • apoc.date.toMillisFormatted('2015/03/25 03-15-59', 'yyyy/MM/dd HH/mm/ss') same as previous, but accepts custom datetime format

  • apoc.date.fromMillis(12345) get string representation of date corresponding to given time in milliseconds

  • apoc.date.fromMillisFormatted(12345, 'yyyy/MM/dd HH/mm/ss') the same as previous, but accepts custom datetime format

Reading separate datetime fields:

Splits date (optionally, using given custom format) into fields returning a map from field name to its value.

  • apoc.date.fields('2015-03-25 03:15:59')

  • apoc.date.fieldsFormatted('2015-01-02 03:04:05 EET', 'yyyy-MM-dd HH:mm:ss zzz')

Following fields are supported:

Result field Represents

'years'

year

'months'

month of year

'days'

day of month

'hours'

hour of day

'minutes'

minute of hour

'seconds'

second of minute

'zone'

time zone

Examples

  apoc.date.fields('2015-03-25 03:15:59') =>
    {
      'Months': 1,
      'Days': 2,
      'Hours': 3,
      'Minutes': 4,
      'Seconds': 5,
      'Years': 2015
    }
apoc.date.fieldsFormatted('2015-01-02 03:04:05 EET', 'yyyy-MM-dd HH:mm:ss zzz') =>
  {
    'ZoneId': 'Europe/Bucharest',
    'Months': 1,
    'Days': 2,
    'Hours': 3,
    'Minutes': 4,
    'Seconds': 5,
    'Years': 2015
  }
apoc.date.fieldsFormatted('2015/01/02_EET', 'yyyy/MM/dd_z') =>
  {
    'Years': 2015,
    'ZoneId': 'Europe/Bucharest',
    'Months': 1,
    'Days': 2
  }

Notes on formats:

  • the default format is yyyy-MM-dd HH:mm:ss

  • if the format pattern doesn’t specify timezone, formatter considers dates to belong to the UTC timezone

  • if the timezone pattern is specified, the timezone is extracted from the date string, otherwise an error will be reported

  • the to/fromSeconds timestamp values are in POSIX (Unix time) system, i.e. timestamps represent the number of seconds elapsed since 00:00:00 UTC, Thursday, 1 January 1970

  • the full list of supported formats is described in SimpleDateFormat JavaDoc

Path Expander (thanks @keesvegter)

The apoc.path.expand procedure makes it possible to do variable length path traversals where you can specify the direction of the relationship per relationship type and a list of Label names which act as a "whitelist" or a "blacklist". The procedure will return a list of Paths in a variable name called "path".

  • call apoc.path.expand(startNode <id>|Node, relationshipFilter, labelFilter, minLevel, maxLevel ) yield path as <identifier>

    • startnode <id> |Node

    • relationshipFilter: RELATIONSHIP_TYPE1{<,>,}|RELATIONSHIP_TYPE2{<,>,}|…​

      • RELATIONSHIP_TYPE> only direction Outgoing

      • RELATIONSHIP_TYPE< only direction Incoming

      • RELATIONSHIP_TYPE both directions

    • labelFilter: {+.-} LABEL1|LABEL2|…​

      • + include label list (white list)

      • - exclude label list (black list)

    • minLevel minimum path level

    • maxLevel maximum path level

Examples

call apoc.path.expand(1,"ACTED_IN>|PRODUCED<|FOLLOWS<","+Movie|Person",0,3)
call apoc.path.expand(1,"ACTED_IN>|PRODUCED<|FOLLOWS<","-BigBrother",0,3)
call apoc.path.expand(1,"ACTED_IN>|PRODUCED<|FOLLOWS<","",0,3)

combined with cypher:

match (tom:Person {name :"Tom Hanks"})
call apoc.path.expand(tom,"ACTED_IN>|PRODUCED<|FOLLOWS<","+Movie|Person",0,3) yield path as pp
return pp;

or

match (p:Person) with p limit 3
call apoc.path.expand(p,"ACTED_IN>|PRODUCED<|FOLLOWS<","+Movie|Person",1,2) yield path as pp
return p, pp

Graph Alorithms (work in progress)

Provides a wrapper around GraphAlgoFactory.

  • CALL apoc.algo.dijkstra(startNode, endNode, relAndDirections, costProperty) - run dijkstra with a relationship property as cost function, relAndDirections is a path expander specification from above.

  • CALL apoc.algo.dijkstraWithDefaultWeight(startNode, endNode, relAndDirections, costProperty, defaultCost) - run dijkstra with a relationship property as cost function. If the relationship property does not exist, use the specified default value instead.

Example: find the weighted shortest path based on relationship property d from A to B following just :ROAD relationships

MATCH (from:Loc{name:'A'}), (to:Loc{name:'D'})
CALL apoc.algo.dijkstra(from, to, 'ROAD', 'd') yield path as path, weight as weight
RETURN path, weight
MATCH (n:Person)

Plans

  • move apoc.get to apoc.nodes and apoc.rels

  • add apoc.nodes.delete(id|ids|node|nodes)

  • (√) add weight/score to manual index operations, expose it, TODO add Sort.RELEVANCE sorter conditionally or unconditionally

  • pass in last count to rundown so you can also do batch-creates

  • warmup procedures that load nodes / rels by skipping one page at a time (8kb/15bytes) (8kb/35bytes)

  • conversions for type-system of "objects" to map, list, node etc. to avoid the semantic errors we sometimes get

  • in browser guide as apoc-help-page

  • (√) optimized collection functions (WIP)

  • Time Conversion Functions (ISO<→ts, padded long representation)

  • ordered, limited retrieval from index (both manual and schema index)

  • json to graph (mapping)

  • virtual graph from collection of nodes and rels, handle node-uniqueness with pk

  • RDF / Ontology loader

  • Encryption / decryption of single properties or a subset or all properties (provide decryption key as param or config)

  • (in progress) Graph Algorithms (Stefan, Max?)

  • custom expanders, e.g. with dynamic rel-type suffixes and prefixes

  • √ Path Finding / Expansion (Kees)

  • Use Cypher as scripting language {cypher:"RETURN a*10+b",params:{a:3,b:5}} for algorithms, parallelization and custom expansion

  • parallel(fragment, params-list, result list)

  • (√) Graph Refactorings (WIP)

  • (√) Job Queue (WIP) See BatchedWriter from Jake/Max

  • run/load shell scripts apoc.load.shell(path)

  • apox.save.dump() whole database, dump("statement"), dump("", "data/import/file") dump("", "URL TO PUT"), formats - binary(packstream), human readable(graphml, graphjson), compression

  • store arbitrary objects in properties with kryo/packstream or similar serialization

  • Procedures in other languages (e.g. JS, JSR-223 scripting → apoc-unsafe project)

  • eval javascript

  • apoc.meta.validate(metagraph) validate a metagraph against the current graph and report violations

  • √ apoc.monitor.{ids,tx,store} simplar calls for the JMX info with tabular output

  • apoc.run.register(name, query[,params]), apoc.run.named(name,[params])

  • apoc.create.graph(nodes,rels,data-map) → {nodes:[], rels:[], data:{}} a graph data structure, e.g. for rendering, export, validation, …​

License

Apache License 2.0

"APOC" Name history

apoc

Apoc was the technician and driver on board of the Nebuchadnezzar in the Matrix movie. He was killed by Cypher.

APOC was also the first bundled A Package Of Components for Neo4j in 2009.

APOC also stands for "Awesome Procedures On Cypher"