This is tool for database migration from Postgres to Neo4j.
This application can map tables from Postgres to nodes from Neo4j.
User can:
- choose what tables from database to migrate
- choose what columns from table to migrate
- rename columns
- add labels to generated nodes
- migrate relationships by migrating foreign keys
- reformat timestamp to custom time format
- migrate tables as inner fields
- Fill
.env
file with your connection data - Configure XML script under
XML_CONFIG_LOCATION
path. - Run application
Here is an example of XML configuration file for node and relationship migration.
You can validate your schema with default schema.xsd
schema. You need to
pass XML_VALIDATION_ENABLED
property as true
.
<migration>
<node>
<tables>
<table name="users">
<configuration>
<excludedColumns>
<column>surname</column>
</excludedColumns>
<renamedColumns>
<columns>
<previousName>name</previousName>
<newName>newName</newName>
</columns>
</renamedColumns>
<follow>
<column value="OneType">dtype</column>
</follow>
<skip>
<column value="AnotherType">dtype</column>
</skip>
<timeFormat>yyyy-MM-dd'T'HH:mm:ss.SSS'Z'</timeFormat>
</configuration>
<labels>
<label>User</label>
<label>BaseEntity</label>
</labels>
</table>
<table name="tasks">
<labels>
<label>Task</label>
<label>BaseEntity</label>
</labels>
</table>
</tables>
</node>
<relationship>
<tables>
<table name="users_tasks">
<configuration>
<sourceColumn>user_id</sourceColumn>
<sourceLabel>User</sourceLabel>
<targetColumn>task_id</targetColumn>
<targetLabel>Task</targetLabel>
<follow>
<column value="OneType">dtype</column>
</follow>
<skip>
<column value="AnotherType">dtype</column>
</skip>
</configuration>
<type>HAS_TASK</type>
</table>
</tables>
</relationship>
<innerField>
<tables>
<table name="users_roles">
<configuration>
<sourceColumn>user_id</sourceColumn>
<sourceLabel>User</sourceLabel>
<valueColumn>role</valueColumn>
<fieldName>userRole</fieldName>
<unique>true</unique>
</configuration>
</table>
</tables>
</innerField>
</migration>
<migration>
- main tag, contains what Node or/and Relationship is going to be created.<node>
- (optional) node migration description tag.<relationship>
- (optional) relationship migration description tag.<tables>
- collection of tables to be migrated.<table>
- table tag, defines table name inname
attribute, its configuration and labels.<configuration>
- (optional fornode
migration) configuration for columns.<excludedColumns>
- (optional fornode
migration) columns to be excluded from migration. It means after migration in Neo4j no data from these columns will be stored.<column>
- column tag, contains table name.<renamedColumns>
- (optional fornode
migration) columns to be renamed during migration. It means data from column with<previousName>
will be stored as<newName>
property.<follow>
- (optional fornode
,relationship
migration) follows only that rows which cells in these columns are equal tovalue
attribute of<column>
tag. If multiple columns are provided, all columns match is required to migrate it.<skip>
- (optional fornode
,relationship
migration) skips only that rows which cells in these columns are equal tovalue
attribute of<column>
tag. If multiple columns are provided, at least one match is required to skip it.<timeFormat>
- (optional fornode
migration) format of timestamp to store in Neo4j. It is needed to store LocalDateTime and access it without converters in code.<labels>
- (optional fornode
migration) collection of labels to be added to Nodes.<label>
- label tag, defines its name.<sourceColumn>
- column with foreign key to entity table. Relationship will be started from Node from that table by this foreign key. Inner field will be added to node with this primary key.<targetColumn>
- column with foreign key to entity table. Relationship will be ended with Node from that table by this foreign key.<sourceLabel>
- (optional forrelationship
,innerField
migration) specifies label of start node to find it by foreign key.<targetLabel>
- (optional forrelationship
migration) specifies label of end node to find it by foreign key.<type>
- type of the relationship.<valueColumn>
- name of column with value for inner field migration.<fieldName>
- name of inner field of node to set value to.<unique>
- (optional forinnerField
migration) specify whether values in inner field must be unique. False if not present.
You can safely omit <labels>
and / or <configuration>
tags for node
migration - then all
columns will be migrated and no labels will be added to generated nodes.
If you want to migrate relationships, you need to add labels to ensure type of nodes to be connected. You can omit this tag if you sure that all of your nodes have unique id.
Note that at first we exclude columns and only after rename them. So if you will rename excluded columns, it was excluded and no columns with this name will be renamed.
We first handle <skip>
rows, it means if row matches <skip>
rule and it
matches <follow>
rule, it won`t be migrated.
By providing several values for one column, they are considered as array of
available values. If this array contains cell value, <skip>
rule will skip
this row, <follow>
rule will follow this row.
We handle Postgres types in generated JSON the following way:
integer
,bigserial
,biginteger
are considered numeric valuesbool
,boolean
are considered boolean valuestimestamp
,timestamp without time zone
as timestamp- other types - strings.
If there are null
in cell, we store this as null
too, so it won`t be saved
to
node.
We recommend to fill up all tags to be sure that correct data will be saved to Neo4j.
Migration is working by generating and executing script.
It dumps Postgres database to *.csv
file in /dump
folder, so app must have
access to write in current directory.
After it, these files are read and uploaded to Neo4j.
These migration files are not deleted after script execution. So you can see what data was dumped and uploaded to Neo4j.
Relationship migration is provided by matching nodes with provided primary key.
So if some of your nodes have similar id, relationship will be added to each of
them. It can be avoided but providing <sourceLabel>
and <targetLabel>
tags.
We parse timestamp from database then format it to provided time format (
from optional <timeFormat>
tag).
If no exceptions were thrown, you will see messages in logs with amount of created nodes and relationships.