feat: as an ADMIN of `datashare-extension-neo4j`, I should be able to run bulk imports for documents and named entities
Closed this issue · 0 comments
ClemDoum commented
Before merging
- merge #29
- test the
neo4j-admin import
CLI on an docker image
PR description
Fixes #25
This PR adds the ability for admins to proceed to bulk imports (from an empty DB), using the neo4j admin import
CLI. This CLI is optimized for large imports, however the it requires the target DB to be empty.
Admin import flow
Export documents and named entities from ES into neo4j formatted csvs:
curl -X POST <datashare>/api/neo4j/admin/neo4j-csvs?project=my-project -d '{"query": {}}' > export.tar.gz
a query can also be provided to reduce the scope of the export:
curl -X POST <datashare>/api/neo4j/admin/neo4j-csvs?project=my-project -d '{"query": {"ids": ["doc-0", "doc-1"]}}' > export.tar.gz
Then decompress the archive and proceed to a dry run in order to control the command which will be executed against the DB:
tar xzvf export.tar.gz
./bulk-import.sh --dry-run
it should print something like:
./bin/neo4j-admin import full \
--skip-bad-relationships \
--database some-specific-db \
--nodes=Document="docs-header.csv,docs.csv" \
--nodes="entities-header.csv,entities.csv" \
--relationships=HAS_PARENT="doc-roots-header.csv,doc-roots.csv" \
--relationships=APPEARS_IN="entity-docs-header.csv,entity-docs.csv"