/neo4j-covid-tracing

🦠Neo4j database to support contact tracing

Primary LanguageTeX

Systems and Methods for Big and Unstructured Data - Delivery #1 - AA 2021/2022 - Prof. Marco Brambilla

🦠Neo4j Covid Tracing Database

Neo4jReport

Considering the scenario in which there’s the need to build a system for managing the COVID-19 pandemic in a specific country, our project focuses on the data perspective level. This is why we designed and implemented a Neo4j data structure to face the need of contact tracing functionality, to monitor the viral diffusion.

Contents

⚙ System requirements

Required software

  • Python 3.8 or higher (only if you want to perform manual load from CSVs)
  • Neo4J database
  • Python modules in requirements.txt (only if you want to perform manual load from CSVs)

🚀 Setup instructions

Clone the repo

git clone https://github.com/pablogiaccaglia/neo4j-covid-tracing
cd neo4j-covid-tracing/

Install required packages

From the project's directory run the following commands:

pip install -r requirements.txt

👨‍💻 Usage

Load from CSV

This operation is advise only if you want to have full control of the database from the data collection and generation point of view, since the process of populating the database takes a lot of time, as stated here.

The first step is to move the CSV files inside the import folder into the corresponding Neo4j folder, whose location changes as follows:

Linux / macOS / Docker Windows Debian / RPM Neo4j Desktop

<neo4j-home>/import

<neo4j-home>\import

/var/lib/neo4j/import

From the Open dropdown menu of your Neo4j instance, select Terminal, and navigate to <installation-version>/import.


Then info of a connection to the Neo4j database is needed. As you can see in the main method of the main.py file, a CovidGraphHandler object is created in the following way:

   handler = CovidGraphHandler("URI", "USER", "PASSWORD")

the data passed to the class' constructor is used in the init method to establish a connection through a driver:

   self.driver = GraphDatabase.driver(uri, auth = (user, password), max_connection_lifetime = 1000)

Different settings can be specified by changing that line of code. More info available here

After this step all you need to do is execute the main method and wait the routine to complete.

The Python code manipulates several CSV files which can be found in different versions inside the datasets folders. If you want to do further changes to them, make sure to substitute the older version with the new one inside the Neo4j import folder. Detailed information of the manipulation process which lead to the final state of the database can be found in the Report.

Load DB Dump

If you dont' want to use Python or install the requiered dependencies, you can quickly start using the database by loading the dump available here. The following section shows how to do so.

Contents

This section describes how to restore a database dump in a live Neo4j deployment.

A database dump can be loaded to a Neo4j instance using the load command of neo4j-admin.

1. Command

The neo4j-admin load command loads a database from an archive created with the neo4j-admin dump command. Alternatively, neo4j-admin load can accept dump from standard input, enabling it to accept input from neo4j-admin dump or another source.

The command can be run from an online or an offline Neo4j DBMS.

If you are replacing an existing database, you have to shut it down before running the command. If you are not replacing an existing database, you must create the database (using CREATE DATABASE against the system database) after the load operation finishes.

neo4j-admin load must be invoked as the neo4j user to ensure the appropriate file permissions.

1.1. Syntax

neo4j-admin load --from=<archive-path>
                 [--verbose]
                 [--expand-commands]
                 [--database=<database>]
                 [--force]
                 [--info]

1.2. Options

Option Default Description

--from

Path to archive created with the neo4j-admin dump command, or - to use standard input.

--verbose

Enable verbose output.

--expand-commands

Allow command expansion in config value evaluation.

--database

neo4j

Name for the loaded database.

--force

Replace an existing database.

--info

Print meta-data information about the archive file, such as, file count, byte count, and format of the load file.

2. Example

The following is an example of how to load the dump of the neo4j database created in the section Back up an offline database, using the neo4j-admin load command. When replacing an existing database, you have to shut it down before running the command.

bin/neo4j-admin load --from=/dumps/neo4j/neo4j-<timestamp>.dump --database=neo4j --force

Unless you are replacing an existing database, you must create the database (using CREATE DATABASE against the system database) after the load operation finishes.


When using the load command to seed a Causal Cluster, and a previous version of the database exists, you must delete it (using DROP DATABASE) first. Alternatively, you can stop the Neo4j instance and unbind it from the cluster using neo4j-admin unbind to remove its cluster state data. If you fail to DROP or unbind before loading the dump, that database’s store files will be out of sync with its cluster state, potentially leading to logical corruptions. For more information, see Seed a cluster from a database backup (online).


📊 Diagrams

ER Diagram

---

ER Diagram

---

📷 Relationships Visualizations

WENT TO TOOK

RECEIVED PART OF

MET LOCATED

LIVES WITH LIVES IN

💡 About database population scripts

The creation script, which can be executed invoking the populateDatabase method of class CovidGraphHandler located inside file main.py, takes approximately 6 hours to complete.
What it creates inside the Neo4j database are:

  • 12014 nodes :

    • 5000 Person nodes
    • 4883 Place nodes
    • 2123 City nodes
    • 4 Vaccine nodes
    • 3 Test nodes
    • 1 Country node
  • 296682 directed (593364 undirected) relationships:

    • 2123 directed (4246 undirected) PART OF relationships
    • 1139 directed (2278 undirected) LOCATED relationships
    • 3752 directed (7504 undirected) RECEIVED relationships
    • 6537 directed (13074 undirected) TOOK relationships
    • 8441 directed (16882 undirected) LIVES WITH relationships
    • 5000 directed (10000 undirected) LIVES IN relationships
    • 119651 directed (239302 undirected) MET relationships
    • 150040 directed (300080 undirected) WENT TO relationships

Information on how the data has been produced can be found on the report

📝 License

This file is part of "Noe4j Covid Tracing Database".

"Neo4j Covid Tracing Database" is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

"Neo4j Covid Tracing Database" is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program (LICENSE.txt). If not, see http://www.gnu.org/licenses/