/diva

Scalable data management system with an AI powered profiling and metadata enrichment

Primary LanguageVueApache License 2.0Apache-2.0

DIVA - Data Inventory and Valuation Approach

An awesome data catalog application
Developed for evaluating the newest data management technologies in context of data transparency, data insight and data networking

GitHub open issues license code-size last-commit

Diva slide screenshot

Table of Contents

Motivation

This is an ongoing project of the Digitization in Service Industries department of the Fraunhofer ISST. Data is getting more and more important to companies. By utilizing the right data, companies can get more productive and will be able to succeed their competitors. Thus, we believe it is time for a data management solution, that evaluates new innovative solutions to support companies in their daily work with data. This tool will grow day by day and we try our best to tackle data management challenges in companies.

We also use this tool as a playground for our students, where they can work out topics for their bachelor or master thesis. Even the PhD students profit from this tool as a platform for their doctoral thesis.

Features

  • 🏛️ microservice architecture: allows to choose the best technology for solving a problem and a more easy scaling

  • 💻 client application: an easy to use web application for managing all kinds of data management related topics

  • 🖥️ portal application: simple search for interesting files on different devices (WIP)

  • 🐳 docker ready: all microservices and core components are docker ready so you can start them right out of the box.

Core Technologies and Frameworks used

Technology Description
Kong our API gateway that we use to route microservices
Kafka message log for microservice communication
node.js nice JavaScript platform for running server apps
Express Framework helps us building simple microservices
Docker building and publishing images
Kubernetes production-grade container orchestration
Airflow author, schedule and monitor workflows
OpenAPI specification language to describe the HTTP APIs of our microservices
AsyncAPI specification language to describe how Kafka and WebSocket messages look
JSON Schema specification language to describe how an entity is build
MongoDB our main document store that is the single source of truth when it comes to metadata
Elasticsearch our search index used to search for entities and make interesting aggregations
Keycloak Open Source Identity and Access Management
MinIO our object store to save files uploaded by browser (aka diva-lake)
neo4j our graph database to store relations between entities more efficient

Other Technologies and Frameworks used

Technology Description
VueJS 2 component based frontend solution for building robust apps
Vuetify makes frontend beautiful
Apache Tika if you need to take a look into heterogenous data, Tika is your solution
Python3 helps us doing data science and NLP (natural language processing)
Kibana our window into elasticsearch for debugging
Filebeat fills elasticsearch with logs produced in our microservices

Quick start

The complete system can be quickly bootstrapped with Docker:

cd docker
# create .env and copy contents from .env.default to it
cp .env.default .env
# execute the script to boot all necessary components
./up_core.sh

To better prepare for the production environment, some system settings must be tweaked. Follow our documentation to learn more about the configuration, concepts and the underlying architecture of DIVA!

Credits

This project is developed by employees of Fraunhofer ISST. They put all their ❤ into this project to try out the latest cutting edge technologies.

Active People

Daniel Tebernum
(Lead)
Sergej Atamantschuk
(Lead)
Anatoly Novohatny

Janis Büse

Daniel Tebernum Sergej Atamantschuk Anatoly Novohatny Janis Büse
Dustin Chabrowski
(Alumni)
Marcel Altendeitering
(Alumni)
Julia Pampus
(Alumni)
Dustin Chabrowski Marcel Altendeitering Julia Pampus

License

Copyright © Fraunhofer ISST 2023