The goal of the project is to perform data and architecture design of a given case study, Bibliography Database, starting from the specification of the problem to data queries in Neo4j,MongoDB and PySpark. The references of the project were the dblp database. The project was divided in 3 deliveries and a final presentation.
- MySQL : from problem specification and assumptions to ER schema.
- Neo4j : from data upload to data queries ( creation, update, aggregations, minimum path, conditions... ) with explanations and complexity / performance time check.
- MongoDB : from the design of the structure of the dataset and data upload and transformation to data queries (creation, upload, multiple filtering conditions, aggregations, unwind, joins, ...) with explanations and complexity / performance time check.
- PySpark : from the design of the structure of the dataset and data upload to data queries (creation, update, join, nested query, group by, ...) with explanations.
Leonardo Giusti
Emmanuël Caputo
Carlo Sgaravatti
Fateme Hajizadekiakalaye
Alireza Yahyanejad