This repository contains the code of the Is a knowledge graph capable of capturing human knowledge article.
In order to run the examples in this code repository you need to have Neo4j installed on your machine. Refer to the section Neo4j Installation for some suggestion on how to find the guides for properly install the right version.
All the code has been tested with Python 3 (although they may work with Python 2 as well) - in all examples python
refers to the Python 3 binary - use python3
if you have both Python 2 & 3 installed.
As best practice in Python, it is better to use a virtual environment where all the necessary dependencies will be installed
without affecting the system installation. So before starting the code review create and activate your virtual environment
by running the following command in the project directory after the git clone
(feel free to rename the virtual environment directory as you wish):
python -m venv .venv
Once created it needs to be activated
source .venv/bin/activate
You also need to set the PYTHONPATH
environment variable, so Python will able to find auxiliary libraries (in the top-level directory of the project):
export PYTHONPATH="$(pwd):$PYTHONPATH"
The commands above run on Linux/Unix sheels for more details and for other operating systems refers to the Python documentation available here.
As reminder the environment must be activated everytime a new shel is opened.
In order to install all the dependencies just run the following command:
make
All the necessary dependencies will be installed. If you don't have the make
command just import the dependencies manually.
For further details on how to install modules on python refers to the related documentation.
The import script requires to specify the username (default: neo4j
) and the password (default: password
), URI (default: bolt://localhost:7687
) to connect to Neo4j and database (default: neo4j
).
These & other parameters could be specified in different ways (in order of priority, if some parameter is missing, then the next method will be tried, or default value will be used)
- by specifying them in the command line. Currently following parameters are supported (execute script with
-h
option):-u
is used to specify Neo4j username-p
is used to specify Neo4j password-b
is used to specify Neo4j URI-s
is used to specify location of the source dataset (if it's used)-d
is used to specify Neo4j database
In this case invocation will look like this:
python import_drkg.py [INSERT ARGUMENTS HERE]
python evaluate_embedding.py [INSERT ARGUMENTS HERE]
-
by specifying them as environment variables:
NEO4J_USER
is used to specify Neo4j usernameNEO4J_PASSWORD
is used to specify Neo4j passwordNEO4J_URI
is used to specify Neo4j URI
-
by specifying parameters in the
neo4j
section of theconfig.ini
file at the top of the project. Besides username, password, URI, it may also contain other parameters that will be passed asconfig
parameter of the GraphDatabase.driver call.
Some scripts, in particular the importing scripts, require also the path where the source dataset resides - this could be done by specifying the path via -s
command-line parameter.
Whenever possible the make
command will also download the datasets, in other cases it is necessary to download the dataset manually and then specify the path during the run.
There are some defaults that make this not necessary but in any case it is possible to specify the path in the following way:
python name_of_the_script.py [INSERT ARGUMENTS HERE]
All the scripts work perfectly with the community and the enterprise edition of Neo4j. Moreover, it is possible to use the Neo4j desktop to manage the Neo4j instances. The book has been written during the transition from 3.5.x to 4.x so the code has been all updated to run on the version 4.x (all code was tested on 4.2.3). It introduced some changes, like the variable binding and the multi relationships syntax in Cypher that make the code incompatible with the previous version 3.5.x.
You can find all the instruction for downloading and installing Neo4j in the way you prefer here:
You can get completely working setup in the matter of seconds by using the Docker & docker-compose. After it's installed, change to the docker-compose
folder, and execute (add -d
if you want to start it in background):
docker-compose up
this will bring you fully functional Neo4j setup, with all plugins installed. After that you can point browser to the http://localhost:7474 and login into Neo4j browser using the name neo4j
and password: password
.
Sometimes you may need to completely cleanup database. You can do it using the functions available in the APOC library that you can install into your database (via UI, or other way):
- Delete everything:
CALL apoc.periodic.iterate('MATCH (n) RETURN n', 'DETACH DELETE n', {batchSize:1000})
- Drop all constraints:
CALL apoc.schema.assert({}, {})