/puppet-graphdb

GraphDB Puppet module

Primary LanguageRubyOtherNOASSERTION

GraphDB Puppet module

Build Status Puppet Forge endorsed Puppet Forge Version Puppet Forge Downloads

Table of Contents

  1. Module description - What the module does and why it is useful
  2. Setup - The basics of getting started with GraphDB
  1. Usage - Configuration options and additional functionality
  2. Advanced features - Extra information on advanced usage
  3. Limitations - OS compatibility, etc.
  4. Development - Guide for contributing to the module
  5. Support - When you need help with this module
  6. License

Module description

This module sets up GraphDB instances with additional resource for repository creation, data loading, updates, backups, and more.

This module has been tested against all major versions of GraphDB 7.* and 8.*

Setup

The module manages the following

  • GraphDB repository files.
  • GraphDB distribution.
  • GraphDB configuration file.
  • GraphDB service.
  • GraphDB plugins.

Requirements

Beginning with GraphDB

Declare the top-level graphdb class and set up an instance:

class{ 'graphdb':
  version              => '7.1.0',
  edition              => 'SE',
}

graphdb::instance { 'graphdb-instance':
   license           => '/home/graphdb/graphdb.license',
}

Usage

Most top-level parameters in the graphdb class are set to reasonable defaults. The following are some parameters that may be useful to override:

Removal/Decommissioning

class { 'graphdb':
  ensure => 'absent'
}

Install everything but disable service(s) afterwards

class { 'graphdb':
  version              => '7.1.0',
  edition              => 'SE',
  status               => 'disabled'
}

Automatically restarting the service (default set to true)

By default, the module will restart GraphDB when the configuration file changed. This can be overridden globally with the following option:

class { 'graphdb':
  version              => '7.1.0',
  edition              => 'SE',
  restart_on_change    => false,
}

Instances

This module works with the concept of instances. For service to start you need to specify at least one instance.

Quick setup

graphdb::instance { 'graphdb-instance': license => '/home/graphdb/graphdb.license' }

This will set up its own data directory and set the service name to: graphdb-instance

Advanced options

Instance specific options can be given:

graphdb::instance { 'graphdb-instance':
  http_port          => 8080, # http port that GraphDB will use
  kill_timeout       => 180, # time before force kill of GraphDB process
  validator_timeout  => 60, # GraphDB repository validator timeout
  logback_config     => undef, # custom GraphDB logback log configuration
  extra_properties   => { }, # extra properties for graphdb.properties file
  external_url       => undef, # graphDB external URL if GraphDB instance is accessed via proxy, e.g. https://ontotext.com/graphdb
  heap_size          => '2g', # GraphDB  java heap size given by -Xmx parameter. Note heap_size parameter will also set xms=xmx
  java_opts          => [], # extra java opts for java process
  protocol           => 'http', # https or http protocol, defaults to http
}

Cluster

Optimum GraphDB EE cluster configuration

  1. Master worker linking parameters:
  • master_repository_id (required)
  • master_endpoint (required)
  • worker_repository_id (required)
  • worker_endpoint (required)
  • replication_port (optional; default to 0)
  1. Master master linking parameters:
  • master_repository_id (required)
  • master_endpoint (required)
  • peer_master_endpoint (required)
  • peer_master_repository_id (required)
  • peer_master_node_id (optional if you define graphdb_link on the same node as registered GraphDB master instance)

Quick setup

A master with one worker

 class { 'graphdb':
   version => '7.1.0',
   edition => 'ee',
 }

 graphdb::instance { 'master':
   license        => '/tmp/ee.license',
   http_port      => 8080,
 }

 graphdb::ee::master::repository { 'master':
   endpoint           => "http://${::ipaddress}:8080",
   repository_context => 'http://ontotext.com/pub/',
 }

 graphdb::instance { 'worker':
   license   => '/tmp/ee.license',
   http_port => 8082,
 }

 graphdb::ee::worker::repository { 'worker':
   endpoint           => "http://${::ipaddress}:8082",
   repository_context => 'http://ontotext.com/pub/',
 }

 graphdb_link { 'master-worker':
   master_repository_id => 'master',
   master_endpoint      => "http://${::ipaddress}:8080",
   worker_repository_id => 'worker',
   worker_endpoint      => "http://${::ipaddress}:8082",
 }

A master with one worker (on the same machine), security turned on and https:

class{ 'graphdb':
   version              => '8.6.0-RC9',
   edition              => 'ee',
}

graphdb::instance { 'master': #Brings up the master
    license           => '/tmp/ee.license',
    extra_properties  => { 'graphdb.connector.SSLEnabled' => 'true', 'graphdb.connector.scheme' => 'https', 'graphdb.connector.secure' => 'true', 'graphdb.connector.keyFile' => '/home/graphdb/.keystore', 'graphdb.connector.keystorePass' => 'password', 'graphdb.connector.keyAlias' => 'graphdb', 'graphdb.connector.keyPass' => 'password', 'graphdb.auth.token.secret' => 'secret' },
   http_port         => 8080,
   protocol	       => 'https',
}

graphdb::ee::master::repository { 'master': #Creating master repo with name “master” , of course you can choose different name
   endpoint            => "https://localhost:8080",
   repository_context  => 'http://ontotext.com/pub/',
   timeout             => 60,
}

graphdb::instance { 'worker': #Brings up the worker
   license           => '/tmp/ee.license',
   extra_properties  => { 'graphdb.connector.SSLEnabled' => 'true', 'graphdb.connector.scheme' => 'https', 'graphdb.connector.secure' => 'true', 'graphdb.connector.keyFile' => '/home/graphdb/.keystore', 'graphdb.connector.keystorePass' => 'password', 'graphdb.connector.keyAlias' => 'graphdb', 'graphdb.connector.keyPass' => 'password', 'graphdb.auth.token.secret' => 'secret' },
   http_port         => 8082,
   protocol	       => 'https',
}

graphdb::ee::worker::repository { 'worker':
   endpoint            => "https://localhost:8082",
   repository_context  => 'http://ontotext.com/pub/',
   timeout             => 60,
}

graphdb_link { 'master-worker':
   master_repository_id => 'master',
   master_endpoint      => "https://localhost:8080",
   worker_repository_id => 'worker',
   worker_endpoint      => "https://localhost:8082",
}

exec { 'enable-security':
  require => graphdb::ee::worker::repository['worker'],
  path => [ '/bin', '/usr/bin', '/usr/local/bin' ],
  command => "curl -k -X POST --header 'Content-Type: application/json' --header 'Accept: */*' -d 'true' 'https://localhost:8080/rest/security'",
  cwd  => '/',
  user => $graphdb::graphdb_user,
}

A two peered masters(split brain)

node 'master1' {

  class { 'graphdb':
    version => '#{graphdb_version}',
    edition => 'ee',
  }

  graphdb::instance { 'master1':
    license        => '/tmp/ee.license',
    http_port      => 8080,
  }

  graphdb::ee::master::repository { 'master1':
    repository_id      => 'master1',
    endpoint           => "http://${::ipaddress}:8080",
    repository_context => 'http://ontotext.com/pub/',
  }

  graphdb_link { 'master1-to-master2':
    master_repository_id      => 'master2',
    master_endpoint           => "http://${::ipaddress}:9090",
    peer_master_repository_id => 'master1',
    peer_master_endpoint      => "http://${::ipaddress}:8080",
  }
}

node 'master2' {
  graphdb::instance { 'master2':
    license        => '/tmp/ee.license',
    http_port      => 9090,
  }

  graphdb::ee::master::repository { 'master2':
    repository_id      => 'master2',
    endpoint           => "http://${::ipaddress}:9090",
    repository_context => 'http://ontotext.com/pub/',
  }

  graphdb_link { 'master2-to-master1':
    master_repository_id      => 'master1',
    master_endpoint           => "http://${::ipaddress}:8080",
    peer_master_repository_id => 'master2',
    peer_master_endpoint      => "http://${::ipaddress}:9090",
  }
}

Link Advanced options

GraphDB Master repository options can be given
 graphdb::ee::master::repository { 'master':
...
  $repository_template = "${module_name}/repository/master.ttl.erb", # ttl template to use as source for repository creation template
  $repository_label = 'GraphDB EE master repository', # repository label
  $node_id          = $title, # node id of master instance
  $timeout = 60, # timeout for repository creation operations
...
 }
GraphDB Worker repository options can be given
Link specific options can be given
 graphdb_link { 'master-worker':
    ...
    replication_port     => 0 # The port for replications that master and worker will use; default: 0
    ...
}
Setup backup cron job
graphdb::ee::backup_cron { 'backup-cronjob':
    master_endpoint   => "http://${::ipaddress}:8080",
    master_repository => 'master',
    hour              => '4',
    minute            => '20',
}

Advanced features

Perform SPARQL update

Example performs update(update_query) on the give repository(repository_id), but only if the ask query(exists_query) doesn't return true(exists_expected_response).

 graphdb_update { 'update':
    repository_id            => 'repository',
    endpoint                 => "http://${::ipaddress}:8080",
    update_query             => 'PREFIX geo-ont: <http://www.test.org/ontology#>
                                 INSERT DATA { <http://test> geo-ont:test "This is a test title" }',
    exists_query             =>  'ask { <http://test> ?p ?o . }',
    exists_expected_response => true,
}

Data import

GraphDB data define

Example triggers import of archive with data(archive), but only if ask query(exists_query) doesn't return true. You can include multiple files into archive in various formats, but keep file extension relative to data format. Also keep in mind that data import operation takes time, adjust timeout according to data size.

graphdb::data{ 'data-zip':
    repository          => 'test-repo',
    endpoint            => "http://${::ipaddress}:8080",
    archive             => 'puppet:///modules/test/test.ttl.zip',
    exists_query        =>  'ask { <http://test> ?p ?o . } ',
}
GraphDB data custom type

Example import data(data) with format(data_format) into repository(repository_id), but only if ask query(exists_query) doesn't return false. You can also provide data source(data_source) which can be a file or directory. If you keep the file extension relative to data format you data providing data format(data_format) is not required. Also keep in mind that data import operation takes time, adjust timeout according to data size.

graphdb_data { 'test-data':
    repository_id       => 'test-repo',
    endpoint            => "http://${::ipaddress}:8080",
    data                => '
        @base <http://test.com#>.
        @prefix test:   <http://test.com/ontologies/test#> .
        <http://test>
                a                test:good ;
                test:price       "5" .
    ',
    exists_query        =>  'ask { <http://test> ?p ?o . } ',
    data_format         => 'turtle',
}

For more information about syntax, please, check here.

Limitations

This module has been built on and tested against Puppet 3.2 and higher.

The module has been tested on:

  • Debian 7/8
  • CentOS 6/7
  • Ubuntu 12.04, 14.04

Because of init.d/systemd/upstart support the module may run on other platforms, but it's not guaranteed.

Development

Please see the CONTRIBUTING.md file for instructions regarding development environments and testing.

Support

Please, use email or open an issue.

License

Please see the LICENSE