This is a Proof of concept (PoC) for a Neo4j schema introspector that produces output in JSON format validating against graph-schema-json-js-utils. It is packaged as a Neo4j user-defined procedure under the name
experimental.introspect.asJson({})
.
You need Java 17 for this project to compile and package. It is build against Neo4j 5 and thus requires Neo4j 5 to run.
Assuming Neo4j is installed in a NEO4J_HOME
, please execute the following steps
$NEO4J_HOME/bin/neo4j stop
./mvnw -Dfast clean package
cp -v target/graph-schema-introspector-*.jar $NEO4J_HOME/plugins
$NEO4J_HOME/bin/neo4j start
After a bit, the instance will have started. For the sake of a functional readme, we assume a database password verysecret
.
You call the procedure like this:
CALL experimental.introspect.asJson({prettyPrint: true})
This will return a single column named value
containing the schema of the database as valid JSON for further processing.
The following command will use Cypher-Shell to pipe the JSON (with some post-processing) through jq
and eventually to standard out:
$NEO4J_HOME/bin/cypher-shell -uneo4j -pverysecret --format plain --non-interactive 'CALL experimental.introspect.asJson({}) YIELD value RETURN value AS _json_' | sed -e 's/\\"/"/g' -e 's/^"//g' -e 's/"\$//g' -e 's/_json_//g' -e 's/"{/{/' -e 's/}"/}/' | jq
Note
|
In the script above, jq is used to format the JSON. While the UDF can pretty print, Cypher-Shell always quotes quotes (turning " into \" ), thus making the output useless. Therefor it is processed first with sed and then piped through jq .
|
A visualization of that schema can be generated with the following statement:
CALL experimental.introspect.asGraph({})
It will visualize the schema document, not the graph itself. The result for the Movie example graph will look like this (you may have to configure the corresponding properties to be used as display names):
A graphy result that reassembles the current db.schema.visualization
can be achieved with
CALL experimental.introspect.asGraph({flat:true})
The same schema as above will look like this:
In both cases you’ll find a properties
field on the properties of node- and relationship object types containing the property information in the same format as the JSON field. In the latter visualization nodes that have multiple labels will only spot the first one as a name
property and the full list will be available as standard labels or in the $id
field in case you did use constant ids.
The following options can be passed as arguments:
Name | Type | Meaning | Default |
---|---|---|---|
|
Boolean |
Pretty prints the generated JSON |
|
|
Boolean |
Uses constant ids for the tokens, derived from their names. Setting this to false generates unique ids |
|
|
Boolean |
Tokens that would need quotation and sanitization when used in Cypher statements will be treated as such by default |
|
|
Boolean |
By default, only 100 distinct relationships between two nodes are sampled to determine the concrete relationships (read: not only the type, but with start and end) owning a set of properties |
|