ChessDataManagement: A Go repository from Cornell High Energy Synchrotron Source

Chess Data Management service

Introduction

The CHESS data flow has been discussed in this document.

Here we propose a possible architecture for CHESS data management based on gradual enchancement of existing infrastructure:

In particular, we propose to introduce the following components:

MetaData DB based on MongoDB or similar document-oriented database. Such solution should provide the following features:
- be able to handle free-structured text documents
- provide reach QueryLanguage (QL)
Files DB based on any relation database, e.g. MySQL or free alternative MariaDB. The purpose of this database is provide data bookkeeping capabilities and organize meta-data in the following form:
- a dataset is a collection of files (or blocks)
- each dataset name may carry on an Experiment name and additional meta-data information
- organize files in specific data-tiers, e.g. RAW for raw data, AOD for processed data, etc.
- as such each dataset will have a form of a path: /Experiment/Processing/Tier

Both databases may reside in their own data-service called MetaData Service. Such service can provide RESTful APIs for end-users, such as

inject data to DBs
fetch results
update data in DBs
delete data in DBs

In addition, we suggest to introduce Input Data Service which can take care of standardization of user inputs, e.g. key-value pairs, tagging, etc. It is not required originally, but will help in a long run to provide uniform data representation for Meta Data Service.

Finally, the data access can be organized via XrootD service.

CHESSComputing/ChessDataManagement

Chess Data Management service

Introduction

References