This repository is deprecated in favour of the more powerful tool Catmandu
easyM2R is a php-based attempt to easily convert MARC data to RDF.
It's easy because you only need
- your MARC data as a file or a string
- a valid JSON-LD file which shows how your data should look like in RDF
- PHP installed in version 5.3.x or higher
- 0.2beta: easyM2R is now more easily to use. It now uses MarcSpec a MARC spec as string parser and validator, which makes a lot of default callback functions deprecated. For custom callback functions the referenced data of the MARC data is always provided within the variable $_params. Thus you may not have to access the MARC record via File_MARC.
easyM2R is build upon the following software
- File_MARC by Dan Scott, Copyright (c) 1991, 1999 Free Software Foundation, Inc.
- JsonLD processor for PHP and IRI Copyright (c) by Markus Lanthaler
- EasyRdf Copyright (c) by Nicholas J Humfrey
- forceutf8 by Sebastián Grignoli
- MarcSpec
+--------------+
| MARC data +----+
| in a file | |
+--------------+ | +--------------+
+-->| MARCFILE2RDF +-----+
+--------------+ | +--------------+ |
| MARCXML data +----+ |
| in a file | |
+--------------+ | +-------------+
+-->| data as RDF |
+--------------+ | +-------------+
| MARC data +----+ |
| as a string | | |
+--------------+ | +----------------+ |
+-->| MARCSTRING2RDF +---+
+--------------+ | +----------------+
| MARCXML data +----+
| as a string |
+--------------+
The best way to download easyM2R is to clone the repository recursively.
git clone --recursive https://github.com/cKlee/easyM2R.git
You can also download the releases and unpack them at the desired place.
Navigate to the easyM2R base directory where you'll find the file 'tordf.php'. At the command line type
php tordf.php -s <PATH_TO_YOUR_MARC_SOURCE> -i <MARC_INPUT_FORMAT> -o <RDF_OUTPUT_SERIALIZATION>
This will output your MARC data in RDF with the desired output serialization. See Using the command line interface for further explanation of the command line options.
With the easyM2R command line interface you can only use MARC data from a file.
The command line interface is called via the script tordf.php with the command php tordf.php. At the command line interface you have these options
- -t Path to the jsonld template
- -o The output format must be one of 'jsonld', 'json', 'php', 'ntriples', 'turtle', 'rdfxml', 'dot', 'n3', 'png', 'gif', 'svg'
- -s The Path to your MARC source file
- -i The MARC input format. 'xml' for MARCXML source
- -c The path to your custom callback functions directory
- -b The base IRI for each MARC record in RDF
- -p 0 or false for merging graphs before output (default might take a lot of memory) or 1 or true to output each single record graph
All options are optional. But if you want to convert your own MARC data, you have to set the -s option at least.
You can test easyM2R by using the example marc data with the command
php tordf.php -s examples/marc/e-discover.mrc
For image formats you might pipe the output to a file
php tordf.php -s examples/marc/e-discover.mrc -o png > e-discover.png
Using easyM2R within a custom PHP script is necessary if you fetch the MARC data from a stream as a string and pass on to easyM2R. This sample code gives a short insight how a custom script could look like:
<?php
// use namespace
use CK\MARC2RDF as m2r;
// always include the autoload.php
include('path/to/easyM2R/autoload.php');
// include your custom callback scripts
foreach(glob('my_callback/callback_*.php') as $filename) include $filename;
// fetch your MARC data here
$xml_string = fetch your...
// initiate easyM2R
$easyM2R = new m2r\MARCSTRING2RDF('../template/default.jsonld',$xml_string,'xml');
// print pretty RDF for browser
print '<pre>'.htmlspecialchars($easyM2R->output('turtle')).'</pre>';
easyM2R provides to main classes MARCFILE2RDF and MARCSTRING2RDF. Use the MARCFILE2RDF class if your data resides in a file and use MARCSTRING2RDF if you want to pass your MARC data as a string to easyM2R.
The MARCFILE2RDF class accepts 4 parameters:
- @param string The local path or URL of the jsonld template file
- @param string Path to MARC data as a file
- @param null|string The MARC format must be one of 'jsonld', 'json', 'php', 'ntriples', 'turtle', 'rdfxml', 'dot', 'n3', 'png', 'gif', 'svg'
- @param null|string The base IRI for each MARC record in RDF
- @param bool $perRecord Do not merge, just make the current recordGraph available
The MARCFILE2RDF class has 3 public methods:
- CK\MARC2RDF\MARCFILE2RDF::__construct($jsonld_file,$marc_file,$marc_format = null,$base = null,$perRecord = false)
- CK\MARC2RDF\MARCFILE2RDF::next()
- CK\MARC2RDF\MARCFILE2RDF::output($format = 'jsonld',jld\GraphInterface $graph = null)
The MARCSTRING2RDF class accepts 4 parameters:
- @param string The local path or URL of the jsonld template file
- @param string MARC data as string
- @param null|string The MARC format must be one of 'jsonld', 'json', 'php', 'ntriples', 'turtle', 'rdfxml', 'dot', 'n3', 'png', 'gif', 'svg'
- @param null|string The base IRI for each MARC record in RDF
- @param bool $perRecord Do not merge, just make the current recordGraph available
The MARCSTRING2RDF class has 3 public methods:
- CK\MARC2RDF\MARCSTRING2RDF::__construct($jsonld_file,string $marc_string,$marc_format = null,$base = null,$perRecord = false)
- CK\MARC2RDF\MARCSTRING2RDF::next()
- CK\MARC2RDF\MARCSTRING2RDF::output($format = 'jsonld',jld\GraphInterface $graph = null)
/**
* @var string The local path or URL of the jsonld template file
*/
public $jsonld_file;
/**
* @var string The local path or URL of the MARC source
*/
public $marc_source;
/**
* @var string the base IRI for the resource
*/
public $base;
/**
* @var GraphInterface Merge of all record graphs
*/
public $newGraph;
/**
* @var GraphInterface The currently created graph
*/
public $recordGraph;
/**
* @var string|null The name of the template graph, which should be used
*/
public $graph_name = null;
/**
* @var string The desired output format
*/
public $format = 'jsonld';
While it is possible to transform MARCXML data, there is a problem with namespaces within the XML. FILE_MARC does not handle namespaces until now. This behaviour may change in the future.
The configuration is done via a JSON-LD document called template. The template is the blue print for every graph resulting from a MARC record. The template has to follow some ground rules, but whatever you do in the template, remember that it must be a valid JSON-LD file. You can test the validity of your template via the JSON-LD Playground.
As an example a default template is provided default.jsonld
.
In the template you want to access the MARC fields and subfields. This is done via a MARC spec. A MARC spec has a defined syntax, which is described under http://cklee.github.io/marc-spec/marc-spec.html .
In short, a MARC spec
fieldTag (["~" characterPositionOrRange] / [subfieldTags] ["_" indicators])
That is, if you want to access subfield 'a' in field '245', the MARC spec is '245a'. For MARC control fields there are no subfields but chracter position or range prefixed with the character "". The MARC spec 0080-4 is a reference to the characters 1 (first character with index 0) to 5 (fifth charcter with index 4) of MARC field 008. Character position and range can also be applied to the Leader. The MARC record Leader in a MARC spec is defined through 'LDR'. The MARC spec LDR~0 is a reference to the first character of the Leader.
A MARC spec is only recognized, if it is prefixed with the easyM2R namespace (see @context).
If there are multiple subfields with the same name in one field, there also will be created multiple nodes.
There is also a more powerful way to access MARC fields via [callbacks].
In the template you must create a @context node. In the @context node the only mandatory entry is the easyM2R namespace declaration.
{
"@context": {
"marc2rdf": "http://my.arbitratynamespace.com#"
}
}
The prefix of the easyM2R namespace must be 'marc2rdf'. The namespaces identifier is also the base IRI to your RDF resources. Choose a custom identifier, which must end with '/' or '#'. You can overwrite the base IRI for your RDF resources by setting the $base param for the MARCFILE2RDF or MARCSTRING2RDF class or with the -b option for the command line tool.
The @graph is the template for each MARC record.
Within the @graph you must define the resources @id. The value of the @id consists of the easyM2R namespace prefix and a MARC spec. I.e.
{
"@id": "marc2rdf:001"
}
In this example, if the data in the control field '001' is '123245', then your resources IRI will be 'http://my.arbitratynamespace.com#12345'.
Now define your properties and objects. Regardless of the node type you create (resource, typed value or untyped value) if you want to access a MARC field/subfield always use the easyM2R namespace as a prefix. Otherwise the MARC spec will not be recognized as one.
MARC data is not always that easy to access. Sometimes you have to check the context, or look up substrings. Or if you want to join data from subfields or shape data in a different way, there is a powerful way to do this via callbacks.
Callbacks are functions that are called if you specify them in the template. There are some default callbacks (see default callbacks) but you can write your own custom callbacks very easily (see create custom callbacks).
In the template if you want to call a callback, prefix the callback name with the easyM2R namespace prefix 'marc2rdf'. This could look like this example
"oclcnum":{"@value": "marc2rdf:callback_prefix_in_parentheses(035a,OCoLC)"}
See default callbacks for specific usage.
If you want to use a callback function to return a value for the rdf:type property, then you can't use the JSON-LD syntax token '@type'. The solution is to define the property 'type' within the @context node and use that instead of '@type'.
There are some predefined default callback functions that are listed here. Each default callback function takes one to n parameters (often the number is fixed), which are either MARC specs or nonspecs. Nonspecs must always be urlencoded.
- param 1: MARC spec
- param 2: regex replace pattern
Return data in the shape of the param m. Data of first param is filled in '$0', second in '$1' and so on...
Example
marc2rdf:callback_template(260abc,$0%20%3A%20$1%2C%20$2)
(since $0 : $1, $2 is url encoded $0%20%3A%20$1%2C%20$2)
leads to something like
Detmold : Kreis Lippe, Der Landrat
- param 1: MARC spec
- param 2: substring
Return data comes after substring.
- param 1: MARC spec
- param 2: MARC spec
- param 3: context
Returns data from param 1, if context is substring of data from param 2.
- param 1: MARC spec
- param 2: containing string
Return data if data from param 1 contains string in param 2.
- param 1: MARC spec
- param 2: prefixed string
Return data without prefix from param 1 if it is prefixed with param 2 in parentheses.
- param 1: MARC spec
- param 2: IRI prefix
Return IRI consisting of value of param 2 and data from param 1.
- param 1: MARC spec
- param 2: join character
Return data from param 1 joined with character in param 2.
- param 1: MARC spec
For each data element returned a whole new blank node is created having all properties of the original defined blank node.
Custom callback functions names must start with 'callback', otherwise they cannot be called.
A callback functions takes two parameters. The first is the MARC record and the second is an array containing MARC specs, nonspes and the resulting data.
The first line of your custom callback function might look like
function callback_mycustom(File_MARC_Record $record, array $_params)
The variable $record is a File_MARC_Record object. This you can access MARC data via its methods (see http://pear.php.net/package/File_MARC/docs for documentation). Although you can access the MARC record data via File_MARC there is also the possibility to process the provided data within the index 'data' of the variable $_param.
For an example the statement in the JSON-LD template
"oclcnum":{"@value": "marc2rdf:callback_prefix_in_parentheses(035a_01,OCoLC)"}
the variable $_params in called default callback function callback_prefix_in_parentheses is an associative array that might look like:
[specs] => Array
(
[0] => Array
(
[field] => 035
[subfield] => Array
(
[a] => a
)
[indicator1] => 0
[indicator2] => 1
)
)
[nonspecs] => Array
(
[0] => OCoLC
)
[rootId] => _:b0
[data] => Array
(
[0] => (OCoLC)723997824
)
See usage of key 'rootId' under dealing with dynamic blank nodes.
Return the data at the end of the function. Then include your custom callbacks in your script like
foreach(glob('my_callback/path/callback_*.php') as $filename) include $filename;
or use the -c option for the command line interface.
For example you specified a blank node in your template
"@id": "marc2rdf:001_0",
"property1":
{
"@id": "_:bnode_1",
"@type": "Sometype",
"property2": {"@value": "marc2rdf:866z"}
"property3": {"@value": "marc2rdf:866y"}
}
and you want for each data of subfield 'z' and 'y' in field '866' to create a blank node, in your callback function the returning array might look like this
// subfield z
[_:b0] => value 1
[_:b1] => value 2
[_:b2] => value 3
// subfield y
[_:b0] => value 4
[_:b1] => value 5
[_:b2] => value 6
This would result in something like
<http://my.arbitratynamespace.com#12345>
someprefix:property1 [
a Sometype ;
someprefix:property2 "value 1" ;
someprefix:property2 "value 4"
], [
a Sometype ;
someprefix:property2 "value 2" ;
someprefix:property2 "value 5"
], [
a Sometype ;
someprefix:property2 "value 3" ;
someprefix:property2 "value 6"
].
But how do you know what blank node identifiers to use? This is the point where you'll need the value of the key 'rootId' in the var $_params. This value is the id of the currently created node. Just make sure that the first key in your returning array is this id and that all other keys are with a higher count.
The resulting graph will be cleaned of empty blank nodes only having a rdf:type property.