/mandaten-download-generator-service

Microservice to generate the dump files of mandatendatabank asynchronously

Primary LanguageJavaScriptMIT LicenseMIT

mandaten-download-generator-service

Microservice that generates the dump files (CSV, TTL) of mandatendatabank asynchronously. A cron job is embedded in the service to trigger an export at the preconfigured frequency.

Installation

To add the service to your stack, add the following snippet to docker-compose.yml:

services:
  export:
    image: lblod/mandaten-download-generator-service:0.3.3
    volumes:
      - ./data/files:/share
      - ./config/export:/config

Don't forget to update the dispatcher configuration to route requests to the export service. The may then be served by the mu-file-service

Model

The task are modelled in agreement with the cogs:Job and task:Task. The full description should be availible on data.gift (TODO). Seel also e.g. jobs-controller-service for more information on the model.

Prefixes

  PREFIX mu: <http://mu.semte.ch/vocabularies/core/>
  PREFIX task: <http://redpencil.data.gift/vocabularies/tasks/>
  PREFIX dct: <http://purl.org/dc/terms/>
  PREFIX prov: <http://www.w3.org/ns/prov#>
  PREFIX nie: <http://www.semanticdesktop.org/ontologies/2007/01/19/nie#>
  PREFIX ext: <http://mu.semte.ch/vocabularies/ext/>
  PREFIX oslc: <http://open-services.net/ns/core#>
  PREFIX cogs: <http://vocab.deri.ie/cogs#>
  PREFIX adms: <http://www.w3.org/ns/adms#>
  PREFIX export: <http://redpencil.data.gift/vocabularies/exports/>
  PREFIX nfo: <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#>
  PREFIX dbpedia: <http://dbpedia.org/ontology/>

Export

A file as a result from an export task.

Class

export:Export

properties
Name Predicate Range Definition
uuid mu:uuid xsd:string
classification export:classification skos:Concept
fileName nfo:fileName xsd:string
format dct:format xsd:string
created dct:created xsd:dateTime
fileSize nfo:fileSize xsd:integer
extension dbpedia:fileExtension xsd:string

Configuration

CSV export

The SPARQL query to execute for the CSV export must be specified in /config/csv-export.sparql. Note that the variable names in the SELECT clause will be used as column headers in the export.

TTL export

The Turtle export must be specified in /config/type-export.json. This JSON specifies a prefix mapping and a list of RDF types with a set of required and optional properties that must be exported per type. An additional filter for the WHERE clause can be specified per type.

E.g.

{
  "prefixes": {
    "mandaat": "http://data.vlaanderen.be/ns/mandaat#",
    "person": "http://www.w3.org/ns/person#",
    "foaf": "http://xmlns.com/foaf/0.1/"
  },
  "types": [
    {
      "type": "mandaat:Mandataris",
      "requiredProperties": [
        "mandaat:start",
        "mandaat:eind"
      ],
      "optionalProperties": [
        "mandaat:status"
      ],
      "additionalFilter": ""
    },
    {
      "type": "person:Person",
      "optionalProperties": [
        "foaf:name"
      ],
      "additionalFilter": ""
    }
  ]
}

Environment variables

The following environment variables can be configured:

  • EXPORT_CRON_PATTERN: cron pattern to configure the frequency of the cron job. The pattern follows the format as specified in node-cron. Defaults to 0 0 */2 * * *, run every 2 hours.
  • EXPORT_FILE_BASE: base name of the export file. Defaults to mandaten. The export file will be named {EXPORT_FILE_BASE}-{timestamp}.{csv|ttl}.
  • EXPORT_TTL_BATCH_SIZE: batch size used as LIMIT in the CONSTRUCT SPARQL queries per type. Defaults to 1000. To have a complete export, make sure EXPORT_TTL_BATCH_SIZE * number_of_matching_triples doesn't exceed the maximum number of triples return by the database (e.g. ResultSetMaxRows in Virtuoso).
  • RETRY_CRON_PATTERN: cron pattern to configure the frequency of the function that retries failed tasks. The pattern follows the format as specified in node-cron. Defaults to 0 */10 * * * *, run every 10 minutes.
  • NUMBER_OF_RETRIES: defined the number of times a task will be retried
  • FILES_GRAPH: graph where files must be stored defaults to http://mu.semte.ch/graphs/system/jobs
  • JOBS_GRAPH: graph where jobs must be stored defaults to http://mu.semte.ch/graphs/system/jobs
  • TASK_OPERATION_URI: specify the opertation URI (a thing you can attach a skos:prefLabel to) of the instance of this service. E.g. http://lblod.data.gift/id/jobs/concept/TaskOperation/exportMandatarissen REQUIRED
  • EXPORT_CLASSIFICATION_URI: the classification of the export, to ease filtering. Defaults to: http://redpencil.data.gift/id/exports/concept/GenericExport

REST API

POST /export-tasks

Trigger a new export asynchronously.

Returns 202 Accepted if the export started successfully. The location response header contains an endpoint to monitor the task status.

Returns 503 Service Unavailable if an export is already running.

GET /export-tasks/:id

Get the status of an export task.

Returns 200 OK with a task resource in the response body. Task status is one of ongoing, done, cancelled or failed.

Development

Add the following snippet to your stack during development:

services:
  export:
    image: semtech/mu-javascript-template:1.3.4
    ports:
      - 8888:80
    environment:
      NODE_ENV: "development"
    volumes:
      - /path/to/your/code:/app/
      - ./data/exports:/data/exports
      - ./config/export:/config

Caveats/TODOs

  • A migration is wishful if you proviously used 0.x.x versions in your stack. To convert the old task model to cogs:Job
  • It needs to be directly linked to virtuoso. No support for CONSTRUCT queries in the current latest version (v0.6.0-beta.6) of mu-auth.
  • From a data model perspective the retry of the task might be confusing. In current implementation, a failed task, does not mean that it will stop. It might end once the threshold of retries is reached
  • An option should be added allow periodic cleanup of the jobs and related exports.
  • The name of the service, might be more generic.