neo4j-apoc-procedures: A Java repository from smartcaveman

Collection of useful Procedures for Neo4j 3.x

Build & install apoc procedures

git clone http://github.com/jexp/neo4j-apoc-procedures
cd neo4j-apoc-procedures
mvn clean install
cp target/apoc-1.0.0-SNAPSHOT.jar $NEO4J_HOME/plugins/
$NEO4J_HOME/bin/neo4j restart

If you want to run embedded or use shell on a disk store, configure your plugins directory in conf/neo4j.conf with dbms.plugin.directory=path/to/plugins.

Calling Procedures within Cypher

This repository uses a recent build (>= 3.0.0) of Neo4j, so that it can leverage procedures being called within Cypher statements.

CALL apoc.load.json('http://example.com/map.json') YIELD value as person
MERGE (p:Person {name:person.name})
ON CREATE SET p.age = person.age, p.children = size(person.children)

Included Procedures

Built in Help

call apoc.help('search') lists name, description-text and if the procedure performs writes (descriptions are WIP), search string is checked against beginning (package) or end (name) of procedure

helpful

CALL apoc.help("apoc") YIELD name, text
WITH * WHERE text IS null
RETURN name AS undocumented

Manual Indexes

Index Queries

Procedures to add to and query manual indexes

apoc.index.addNode(node,['prop1',…]) add node to an index for each label it has
apoc.index.addNodeByLabel(node,'Label',['prop1',…]) add node to an index for the given label
apoc.index.addRelationship(rel,['prop1',…]) add relationship to an index for its type
apoc.index.nodes('Label','prop:value*') YIELD node, weight lucene query on node index with the given label name
apoc.index.relationships('TYPE','prop:value*') YIELD rel, weight lucene query on relationship index with the given type name
apoc.index.between(node1,'TYPE',node2,'prop:value*') YIELD rel, weight lucene query on relationship index with the given type name bound by either or both sides (each node parameter can be null)
apoc.index.out(node,'TYPE','prop:value*') YIELD node, weight lucene query on relationship index with the given type name for outgoing relationship of the given node, returns end-nodes
apoc.index.in(node,'TYPE','prop:value*') YIELD node, weight lucene query on relationship index with the given type name for incoming relationship of the given node, returns start-nodes

Index Management

CALL apoc.index.list() - YIELD type,name,config - lists all manual indexes
CALL apoc.index.remove('name') YIELD type,name,config - removes manual indexes
CALL apoc.index.forNodes('name',{config}) YIELD type,name,config - gets or creates manual node index
CALL apoc.index.forRelationships('name',{config}) YIELD type,name,config - gets or creates manual relationship index

match (p:Person) call apoc.index.addNode(p,["name","age"]) RETURN count(*);
// 129s for 1M People
call apoc.index.nodes('Person','name:name100*') YIELD node, weight return * limit 2

Meta Graph

Returns a virtual graph that represents the labels and relationship-types available in your database and how they are connected.

CALL apoc.meta.graph - examines the full graph to create the meta-graph
CALL apoc.meta.graphSample(sampleSize) - examines a sample graph to create the meta-graph, default sampleSize is 100
CALL apoc.meta.data - examines a subset of the graph to provide a tabular meta information
CALL apoc.meta.type(value) - type name of a value (INTEGER,FLOAT,STRING,BOOLEAN,RELATIONSHIP,NODE,PATH,NULL,UNKNOWN,MAP,LIST)
CALL apoc.meta.isType(value,type) - returns a row if type name matches none if not

MATCH (n:Person)
CALL apoc.meta.isType(n.age,"INTEGER")
RETURN n LIMIT 5

Locking

call apoc.lock.nodes([nodes]) acquires a write lock on the given nodes
call apoc.lock.rels([relationships]) acquires a write lock on the given relationship
call apoc.lock.all([nodes],[relationships]) acquires a write lock on the given nodes and relationships

from/toJson

CALL apoc.convert.toJson([1,2,3])
CALL apoc.convert.toJson({a:42,b:\"foo\",c:[1,2,3]})
CALL apoc.convert.fromJsonList('[1,2,3]')
CALL apoc.convert.fromJsonMap('{\"a\":42,\"b\":\"foo\",\"c\":[1,2,3]}')

Loading Data from RDBMS

CALL apoc.load.jdbc('jdbc:derby:derbyDB','PERSON') YIELD row CREATE (:Person {name:row.name}) load from relational database, either a full table or a sql statement
CALL apoc.load.jdbc('jdbc:derby:derbyDB','SELECT * FROM PERSON WHERE AGE > 18') load from relational database, either a full table or a sql statement
CALL apoc.load.driver('org.apache.derby.jdbc.EmbeddedDriver') register JDBC driver of source database

Loading Data from Web-APIs (JSON, XML, CSV)

CALL apoc.load.json('http://example.com/map.json') YIELD value as person CREATE (p:Person) SET p = person load from JSON URL (e.g. web-api) to import JSON as stream of values if the JSON was an array or a single value if it was a map
CALL apoc.load.xml('http://example.com/test.xml') YIELD value as doc CREATE (p:Person) SET p.name = doc.name load from XML URL (e.g. web-api) to import XML as single nested map with attributes and _type, _text and `_children`x fields.
CALL apoc.load.csv('url',{sep:";"}) YIELD lineNo, list, map - load CSV fom URL as stream of values
- config contains any of: {skip:1,limit:5,header:false,sep:'TAB',ignore:['tmp'],arraySep:';',mapping:{years:{type:'int',arraySep:'-',array:false,name:'age',ignore:false}}

Creating Data

CALL apoc.create.node(['Label'], {key:value,…}) create node with dynamic labels
CALL apoc.create.nodes(['Label'], [{key:value,…}]) create multiple nodes with dynamic labels
CALL apoc.create.addLabels( [node,id,ids,nodes], ['Label',…]) - adds the given labels to the node or nodes
CALL apoc.create.removeLabels( [node,id,ids,nodes], ['Label',…]) - removes the given labels from the node or nodes
CALL apoc.create.relationship(person1,'KNOWS',{key:value,…}, person2) create relationship with dynamic rel-type
CALL apoc.create.uuid YIELD uuid - creates an UUID
CALL apoc.create.uuids(count) YIELD uuid - creates count UUIDs

Virtual Nodes/Rels

Virtual Nodes and Relationships don’t exist in the graph, they are only returned to the UI/user for representing a graph projection. They can be visualized or processed otherwise. Please note that they have negative id’s.

CALL apoc.create.vNode(['Label'], {key:value,…}) returns a virtual node
CALL apoc.create.vNodes(['Label'], [{key:value,…}]) returns virtual nodes
CALL apoc.create.vRelationship(nodeFrom,'KNOWS',{key:value,…}, nodeTo) returns a virtual relationship
CALL apoc.create.vPattern({_labels:['LabelA'],key:value},'KNOWS',{key:value,…}, {_labels:['LabelB'],key:value}) returns a virtual pattern
CALL apoc.create.vPatternFull(['LabelA'],{key:value},'KNOWS',{key:value,…},['LabelB'],{key:value}) returns a virtual pattern
TODO `CALL apoc.create.vGraph([nodes, {_labels:[],… prop:value,…}], [rels,{_from:keyValueFrom,_to:{_label:,_key:,_value:value}, _type:'KNOWS', prop:value,…}],['pk1','Label2:pk2'])

Example

MATCH (a)-[r]->(b)
WITH head(labels(a)) AS l, head(labels(b)) AS l2, type(r) AS rel_type, count(*) as count
CALL apoc.create.vNode(['Meta_Node'],{name:l}) yield node as a
CALL apoc.create.vNode(['Meta_Node'],{name:l2}) yield node as b
CALL apoc.create.vRelationship(a,'META_RELATIONSHIP',{name:rel_type, count:count},b) yield rel
RETURN *;

Monitoring (thanks @ikwattro)

apoc.monitor.ids - node and relationships-ids in total and in use
apoc.monitor.kernel - store information such as kernel version, start time, read-only, database-name, store-log-version etc.
apoc.monitor.store - store size information for the different types of stores
apoc.monitor.tx - number of transactions total,opened,committed,concurrent,rolled-back,last-tx-id

Job Management

CALL apoc.periodic.commit(statement, params) - repeats an batch update statement until it returns 0, this procedure is blocking
CALL apoc.periodic.list() - list all jobs
CALL apoc.periodic.submit('name',statement) - submit a one-off background statement
CALL apoc.periodic.schedule('name',statement,repeat-time-in-seconds) - submit a repeatedly-called background statement
CALL apoc.periodic.countdown('name',statement,delay-in-seconds) - submit a repeatedly-called background statement until it returns 0
there are also static methods Jobs.submit, and Jobs.schedule to be used from other procedures
jobs list is checked / cleared every 10s for finished jobs
CALL apoc.periodic.rock_n_roll(statementIteration, statementAction, batchSize) YIELD batches, total - iterate over first statement and apply action statement with given transaction batch size. Returns to numeric values holding the number of batches and the number of total processed rows. E.g.

CALL apoc.periodic.rock_n_roll('match (p:Person) return p', 'MATCH (p) where p={p} SET p.lastname =p.name', 20000)

copies over the name property of each person to lastname.

Graph Refactoring

√ call apoc.refactor.cloneNodes([node1,node2,…]) clone nodes with their labels and properties
√ call apoc.refactor.cloneNodesWithRelationships([node1,node2,…]) clone nodes with their labels, properties and relationships
√ call apoc.refactor.mergeNodes([node1,node2]) merge nodes onto first in list
√ call apoc.refactor.to(rel, endNode) redirect relationship to use new end-node
√ call apoc.refactor.from(rel, startNode) redirect relationship to use new start-node
√ call apoc.refactor.setType(rel, 'NEW-TYPE') change relationship-type
merge nodes by label + property
merge relationships
√ call apoc.refactor.extractNode([rel1,rel2,…], [labels], 'OUT','IN') extract node from relationships
√ `call apoc.refactor.collapseNode([node1,node2],'TYPE') ` - collapse node to relationship, node with one rel becomes self-relationship

Helpers

apoc.map.fromPairs([[key,value],[key2,value2],…])
apoc.map.fromLists([keys],[values])
apoc.map.fromValues([key,value,key1,value1])
apoc.map.setKey(map,key,value)
apoc.coll.sum([0.5,1,2.3])
apoc.coll.min([0.5,1,2.3])
apoc.coll.max([0.5,1,2.3])
apoc.coll.sumLongs([1,3,3])
apoc.coll.partition(list,batchSize)
apoc.coll.zip([list1],[list2])
apoc.coll.pairs([list]) returns `[first,second],[second,third], …
apoc.coll.toSet([list]) returns a unique list backed by a set
apoc.coll.sort(coll) sort on Collections
apoc.coll.sortNodes([nodes], 'name') sort nodes by property
apoc.coll.contains(coll, value) optimized contains operation (using a HashSet) (returns single row or not)
apoc.coll.containsAll(coll, values) optimized contains-all operation (using a HashSet) (returns single row or not)
apoc.coll.containsSorted(coll, value) optimized contains on a sorted list operation (Collections.binarySearch) (returns single row or not)
apoc.coll.containsAllSorted(coll, value) optimized contains-all on a sorted list operation (Collections.binarySearch) (returns single row or not)
apoc.get.nodes(node|id|[ids]) yield node quickly returns all nodes with these id’s
apoc.get.rels(rels|id|[ids]) yield rel quickly returns all relationships with these id’s

Date/time Support (thanks @tkroman)

Conversion between formatted dates and timestamps

apoc.date.toSeconds('2015-03-25 03:15:59') get Unix time equivalent of given date (in seconds)
apoc.date.toSecondsFormatted('2015/03/25 03-15-59', 'yyyy/MM/dd HH/mm/ss') same as previous, but accepts custom datetime format
apoc.date.fromSeconds(12345) get string representation of date corresponding to given Unix time (in seconds)
apoc.date.fromSecondsFormatted(12345, 'yyyy/MM/dd HH/mm/ss') the same as previous, but accepts custom datetime format
apoc.date.toMillis('2015-03-25 03:15:59') get Unix time equivalent of given date (in milliseconds)
apoc.date.toMillisFormatted('2015/03/25 03-15-59', 'yyyy/MM/dd HH/mm/ss') same as previous, but accepts custom datetime format
apoc.date.fromMillis(12345) get string representation of date corresponding to given time in milliseconds
apoc.date.fromMillisFormatted(12345, 'yyyy/MM/dd HH/mm/ss') the same as previous, but accepts custom datetime format

Reading separate datetime fields:

Splits date (optionally, using given custom format) into fields returning a map from field name to its value.

apoc.date.fields('2015-03-25 03:15:59')
apoc.date.fieldsFormatted('2015-01-02 03:04:05 EET', 'yyyy-MM-dd HH:mm:ss zzz')

Following fields are supported:

Result field	Represents
'years'	year
'months'	month of year
'days'	day of month
'hours'	hour of day
'minutes'	minute of hour
'seconds'	second of minute
'zone'	time zone

Examples

  apoc.date.fields('2015-03-25 03:15:59') =>
    {
      'Months': 1,
      'Days': 2,
      'Hours': 3,
      'Minutes': 4,
      'Seconds': 5,
      'Years': 2015
    }

apoc.date.fieldsFormatted('2015-01-02 03:04:05 EET', 'yyyy-MM-dd HH:mm:ss zzz') =>
  {
    'ZoneId': 'Europe/Bucharest',
    'Months': 1,
    'Days': 2,
    'Hours': 3,
    'Minutes': 4,
    'Seconds': 5,
    'Years': 2015
  }

apoc.date.fieldsFormatted('2015/01/02_EET', 'yyyy/MM/dd_z') =>
  {
    'Years': 2015,
    'ZoneId': 'Europe/Bucharest',
    'Months': 1,
    'Days': 2
  }

Notes on formats:

the default format is yyyy-MM-dd HH:mm:ss
if the format pattern doesn’t specify timezone, formatter considers dates to belong to the UTC timezone
if the timezone pattern is specified, the timezone is extracted from the date string, otherwise an error will be reported
the to/fromSeconds timestamp values are in POSIX (Unix time) system, i.e. timestamps represent the number of seconds elapsed since 00:00:00 UTC, Thursday, 1 January 1970
the full list of supported formats is described in SimpleDateFormat JavaDoc

Path Expander (thanks @keesvegter)

The apoc.path.expand procedure makes it possible to do variable length path traversals where you can specify the direction of the relationship per relationship type and a list of Label names which act as a "whitelist" or a "blacklist". The procedure will return a list of Paths in a variable name called "path".

call apoc.path.expand(startNode <id>|Node, relationshipFilter, labelFilter, minLevel, maxLevel ) yield path as <identifier>
- startnode <id> |Node
- relationshipFilter: RELATIONSHIP_TYPE1{<,>,}|RELATIONSHIP_TYPE2{<,>,}|…
  - RELATIONSHIP_TYPE> only direction Outgoing
  - RELATIONSHIP_TYPE< only direction Incoming
  - RELATIONSHIP_TYPE both directions
- labelFilter: {+.-} LABEL1|LABEL2|…
  - + include label list (white list)
  - - exclude label list (black list)
- minLevel minimum path level
- maxLevel maximum path level

Examples

call apoc.path.expand(1,"ACTED_IN>|PRODUCED<|FOLLOWS<","+Movie|Person",0,3)
call apoc.path.expand(1,"ACTED_IN>|PRODUCED<|FOLLOWS<","-BigBrother",0,3)
call apoc.path.expand(1,"ACTED_IN>|PRODUCED<|FOLLOWS<","",0,3)

combined with cypher:

match (tom:Person {name :"Tom Hanks"})
call apoc.path.expand(tom,"ACTED_IN>|PRODUCED<|FOLLOWS<","+Movie|Person",0,3) yield path as pp
return pp;

or

match (p:Person) with p limit 3
call apoc.path.expand(p,"ACTED_IN>|PRODUCED<|FOLLOWS<","+Movie|Person",1,2) yield path as pp
return p, pp

Graph Alorithms (work in progress)

Provides a wrapper around GraphAlgoFactory.

CALL apoc.algo.dijkstra(startNode, endNode, relAndDirections, costProperty) - run dijkstra with a relationship property as cost function, relAndDirections is a path expander specification from above.
CALL apoc.algo.dijkstraWithDefaultWeight(startNode, endNode, relAndDirections, costProperty, defaultCost) - run dijkstra with a relationship property as cost function. If the relationship property does not exist, use the specified default value instead.

Example: find the weighted shortest path based on relationship property d from A to B following just :ROAD relationships

MATCH (from:Loc{name:'A'}), (to:Loc{name:'D'})
CALL apoc.algo.dijkstra(from, to, 'ROAD', 'd') yield path as path, weight as weight
RETURN path, weight
MATCH (n:Person)

Plans

move apoc.get to apoc.nodes and apoc.rels
add apoc.nodes.delete(id|ids|node|nodes)
(√) add weight/score to manual index operations, expose it, TODO add Sort.RELEVANCE sorter conditionally or unconditionally
pass in last count to rundown so you can also do batch-creates
warmup procedures that load nodes / rels by skipping one page at a time (8kb/15bytes) (8kb/35bytes)
conversions for type-system of "objects" to map, list, node etc. to avoid the semantic errors we sometimes get
in browser guide as apoc-help-page
(√) optimized collection functions (WIP)
Time Conversion Functions (ISO<→ts, padded long representation)
ordered, limited retrieval from index (both manual and schema index)
json to graph (mapping)
virtual graph from collection of nodes and rels, handle node-uniqueness with pk
RDF / Ontology loader
Encryption / decryption of single properties or a subset or all properties (provide decryption key as param or config)
(in progress) Graph Algorithms (Stefan, Max?)
custom expanders, e.g. with dynamic rel-type suffixes and prefixes
√ Path Finding / Expansion (Kees)
Use Cypher as scripting language {cypher:"RETURN a*10+b",params:{a:3,b:5}} for algorithms, parallelization and custom expansion
parallel(fragment, params-list, result list)
(√) Graph Refactorings (WIP)
(√) Job Queue (WIP) See BatchedWriter from Jake/Max
run/load shell scripts apoc.load.shell(path)
apox.save.dump() whole database, dump("statement"), dump("", "data/import/file") dump("", "URL TO PUT"), formats - binary(packstream), human readable(graphml, graphjson), compression
store arbitrary objects in properties with kryo/packstream or similar serialization
Procedures in other languages (e.g. JS, JSR-223 scripting → apoc-unsafe project)
eval javascript
apoc.meta.validate(metagraph) validate a metagraph against the current graph and report violations
√ apoc.monitor.{ids,tx,store} simplar calls for the JMX info with tabular output
apoc.run.register(name, query[,params]), apoc.run.named(name,[params])
apoc.create.graph(nodes,rels,data-map) → {nodes:[], rels:[], data:{}} a graph data structure, e.g. for rendering, export, validation, …

License

Apache License 2.0

"APOC" Name history

Apoc was the technician and driver on board of the Nebuchadnezzar in the Matrix movie. He was killed by Cypher.

APOC was also the first bundled A Package Of Components for Neo4j in 2009.

APOC also stands for "Awesome Procedures On Cypher"

smartcaveman/neo4j-apoc-procedures

Collection of useful Procedures for Neo4j 3.x

Calling Procedures within Cypher

Included Procedures

Built in Help

Manual Indexes

Index Queries

Index Management

Meta Graph

Locking

from/toJson

Loading Data from RDBMS

Loading Data from Web-APIs (JSON, XML, CSV)

Creating Data

Virtual Nodes/Rels

Monitoring (thanks @ikwattro)

Job Management

Graph Refactoring

Helpers

Date/time Support (thanks @tkroman)

Conversion between formatted dates and timestamps

Reading separate datetime fields:

Examples

Notes on formats:

Path Expander (thanks @keesvegter)

Examples

Graph Alorithms (work in progress)

Plans

License

"APOC" Name history