- Module description - What the module does and why it is useful
- Setup - The basics of getting started with GraphDB
- Usage - Configuration options and additional functionality
- Advanced features - Extra information on advanced usage
- Limitations - OS compatibility, etc.
- Development - Guide for contributing to the module
- Support - When you need help with this module
- License
This module sets up GraphDB instances with additional resource for repository creation, data loading, updates, backups, and more.
This module has been tested against all major versions of GraphDB 7.* and 8.*
- GraphDB repository files.
- GraphDB distribution.
- GraphDB configuration file.
- GraphDB service.
- GraphDB plugins.
- The stdlib Puppet library.
Declare the top-level graphdb
class and set up an instance:
class{ 'graphdb':
version => '7.1.0',
edition => 'SE',
}
graphdb::instance { 'graphdb-instance':
license => '/home/graphdb/graphdb.license',
}
Most top-level parameters in the graphdb
class are set to reasonable defaults.
The following are some parameters that may be useful to override:
class { 'graphdb':
ensure => 'absent'
}
class { 'graphdb':
version => '7.1.0',
edition => 'SE',
status => 'disabled'
}
By default, the module will restart GraphDB when the configuration file changed. This can be overridden globally with the following option:
class { 'graphdb':
version => '7.1.0',
edition => 'SE',
restart_on_change => false,
}
This module works with the concept of instances. For service to start you need to specify at least one instance.
graphdb::instance { 'graphdb-instance': license => '/home/graphdb/graphdb.license' }
This will set up its own data directory and set the service name to: graphdb-instance
Instance specific options can be given:
graphdb::instance { 'graphdb-instance':
http_port => 8080, # http port that GraphDB will use
kill_timeout => 180, # time before force kill of GraphDB process
validator_timeout => 60, # GraphDB repository validator timeout
logback_config => undef, # custom GraphDB logback log configuration
extra_properties => { }, # extra properties for graphdb.properties file
external_url => undef, # graphDB external URL if GraphDB instance is accessed via proxy, e.g. https://ontotext.com/graphdb
heap_size => '2g', # GraphDB java heap size given by -Xmx parameter. Note heap_size parameter will also set xms=xmx
java_opts => [], # extra java opts for java process
protocol => 'http', # https or http protocol, defaults to http
}
Optimum GraphDB EE cluster configuration
- Master worker linking parameters:
- master_repository_id (required)
- master_endpoint (required)
- worker_repository_id (required)
- worker_endpoint (required)
- replication_port (optional; default to 0)
- Master master linking parameters:
- master_repository_id (required)
- master_endpoint (required)
- peer_master_endpoint (required)
- peer_master_repository_id (required)
- peer_master_node_id (optional if you define graphdb_link on the same node as registered GraphDB master instance)
A master with one worker
class { 'graphdb':
version => '7.1.0',
edition => 'ee',
}
graphdb::instance { 'master':
license => '/tmp/ee.license',
http_port => 8080,
}
graphdb::ee::master::repository { 'master':
endpoint => "http://${::ipaddress}:8080",
repository_context => 'http://ontotext.com/pub/',
}
graphdb::instance { 'worker':
license => '/tmp/ee.license',
http_port => 8082,
}
graphdb::ee::worker::repository { 'worker':
endpoint => "http://${::ipaddress}:8082",
repository_context => 'http://ontotext.com/pub/',
}
graphdb_link { 'master-worker':
master_repository_id => 'master',
master_endpoint => "http://${::ipaddress}:8080",
worker_repository_id => 'worker',
worker_endpoint => "http://${::ipaddress}:8082",
}
A master with one worker (on the same machine), security turned on and https:
class{ 'graphdb':
version => '8.6.0-RC9',
edition => 'ee',
}
graphdb::instance { 'master': #Brings up the master
license => '/tmp/ee.license',
extra_properties => { 'graphdb.connector.SSLEnabled' => 'true', 'graphdb.connector.scheme' => 'https', 'graphdb.connector.secure' => 'true', 'graphdb.connector.keyFile' => '/home/graphdb/.keystore', 'graphdb.connector.keystorePass' => 'password', 'graphdb.connector.keyAlias' => 'graphdb', 'graphdb.connector.keyPass' => 'password', 'graphdb.auth.token.secret' => 'secret' },
http_port => 8080,
protocol => 'https',
}
graphdb::ee::master::repository { 'master': #Creating master repo with name “master” , of course you can choose different name
endpoint => "https://localhost:8080",
repository_context => 'http://ontotext.com/pub/',
timeout => 60,
}
graphdb::instance { 'worker': #Brings up the worker
license => '/tmp/ee.license',
extra_properties => { 'graphdb.connector.SSLEnabled' => 'true', 'graphdb.connector.scheme' => 'https', 'graphdb.connector.secure' => 'true', 'graphdb.connector.keyFile' => '/home/graphdb/.keystore', 'graphdb.connector.keystorePass' => 'password', 'graphdb.connector.keyAlias' => 'graphdb', 'graphdb.connector.keyPass' => 'password', 'graphdb.auth.token.secret' => 'secret' },
http_port => 8082,
protocol => 'https',
}
graphdb::ee::worker::repository { 'worker':
endpoint => "https://localhost:8082",
repository_context => 'http://ontotext.com/pub/',
timeout => 60,
}
graphdb_link { 'master-worker':
master_repository_id => 'master',
master_endpoint => "https://localhost:8080",
worker_repository_id => 'worker',
worker_endpoint => "https://localhost:8082",
}
exec { 'enable-security':
require => graphdb::ee::worker::repository['worker'],
path => [ '/bin', '/usr/bin', '/usr/local/bin' ],
command => "curl -k -X POST --header 'Content-Type: application/json' --header 'Accept: */*' -d 'true' 'https://localhost:8080/rest/security'",
cwd => '/',
user => $graphdb::graphdb_user,
}
A two peered masters(split brain)
node 'master1' {
class { 'graphdb':
version => '#{graphdb_version}',
edition => 'ee',
}
graphdb::instance { 'master1':
license => '/tmp/ee.license',
http_port => 8080,
}
graphdb::ee::master::repository { 'master1':
repository_id => 'master1',
endpoint => "http://${::ipaddress}:8080",
repository_context => 'http://ontotext.com/pub/',
}
graphdb_link { 'master1-to-master2':
master_repository_id => 'master2',
master_endpoint => "http://${::ipaddress}:9090",
peer_master_repository_id => 'master1',
peer_master_endpoint => "http://${::ipaddress}:8080",
}
}
node 'master2' {
graphdb::instance { 'master2':
license => '/tmp/ee.license',
http_port => 9090,
}
graphdb::ee::master::repository { 'master2':
repository_id => 'master2',
endpoint => "http://${::ipaddress}:9090",
repository_context => 'http://ontotext.com/pub/',
}
graphdb_link { 'master2-to-master1':
master_repository_id => 'master1',
master_endpoint => "http://${::ipaddress}:8080",
peer_master_repository_id => 'master2',
peer_master_endpoint => "http://${::ipaddress}:9090",
}
}
graphdb::ee::master::repository { 'master':
...
$repository_template = "${module_name}/repository/master.ttl.erb", # ttl template to use as source for repository creation template
$repository_label = 'GraphDB EE master repository', # repository label
$node_id = $title, # node id of master instance
$timeout = 60, # timeout for repository creation operations
...
}
- For
EE
, please, check here. Also, please, check GraphDB EE documentation. - For
SE
, please, check here. Also, please, check GraphDB SE documentation.
graphdb_link { 'master-worker':
...
replication_port => 0 # The port for replications that master and worker will use; default: 0
...
}
graphdb::ee::backup_cron { 'backup-cronjob':
master_endpoint => "http://${::ipaddress}:8080",
master_repository => 'master',
hour => '4',
minute => '20',
}
Example performs update(update_query
) on the give repository(repository_id
), but only if the ask query(exists_query
) doesn't return true(exists_expected_response
).
graphdb_update { 'update':
repository_id => 'repository',
endpoint => "http://${::ipaddress}:8080",
update_query => 'PREFIX geo-ont: <http://www.test.org/ontology#>
INSERT DATA { <http://test> geo-ont:test "This is a test title" }',
exists_query => 'ask { <http://test> ?p ?o . }',
exists_expected_response => true,
}
Example triggers import of archive with data(archive
), but only if ask query(exists_query
) doesn't return true.
You can include multiple files into archive in various formats, but keep file extension relative to data format.
Also keep in mind that data import operation takes time, adjust timeout according to data size.
graphdb::data{ 'data-zip':
repository => 'test-repo',
endpoint => "http://${::ipaddress}:8080",
archive => 'puppet:///modules/test/test.ttl.zip',
exists_query => 'ask { <http://test> ?p ?o . } ',
}
Example import data(data
) with format(data_format
) into repository(repository_id
), but only if ask query(exists_query
) doesn't return false.
You can also provide data source(data_source
) which can be a file or directory.
If you keep the file extension relative to data format you data providing data format(data_format
) is not required.
Also keep in mind that data import operation takes time, adjust timeout according to data size.
graphdb_data { 'test-data':
repository_id => 'test-repo',
endpoint => "http://${::ipaddress}:8080",
data => '
@base <http://test.com#>.
@prefix test: <http://test.com/ontologies/test#> .
<http://test>
a test:good ;
test:price "5" .
',
exists_query => 'ask { <http://test> ?p ?o . } ',
data_format => 'turtle',
}
For more information about syntax, please, check here.
This module has been built on and tested against Puppet 3.2 and higher.
The module has been tested on:
- Debian 7/8
- CentOS 6/7
- Ubuntu 12.04, 14.04
Because of init.d/systemd/upstart support the module may run on other platforms, but it's not guaranteed.
Please see the CONTRIBUTING.md file for instructions regarding development environments and testing.
Please, use email or open an issue.
Please see the LICENSE