/isis2json

CDS/ISIS to JSON database converter (compatible with CouchDB and MongoDB)

Primary LanguagePythonGNU Lesser General Public License v2.1LGPL-2.1

===============================================
isis2json: CDS/ISIS to JSON database converter
===============================================

The isis2json.py is a Python/Jython script to export ISIS (MST+XRF)
or ISO-2709 databases to JSON files, optionally compatible with CouchDB
and MongoDB.

Running under Jython, both MST+XRF and ISO-2709 files can be read,
thanks to the Bruma Java library found in the lib/ directory.

Running under Python, only ISO-2709 files can be read.

Usage
======

::

  $ ./isis2json.py -h
  usage: isis2json.py [-h] [-o OUTPUT.json] [-c] [-m] [-t ISIS_JSON_TYPE]
                      [-q QTY] [-s SKIP] [-i TAG_NUMBER] [-u] [-p PREFIX]
                      [-n] [-k TAG:VALUE]
                      INPUT.(mst|iso)

  Convert an ISIS .mst or .iso file to a JSON array

  positional arguments:
    INPUT.(mst|iso)     .mst or .iso file to read

  optional arguments:
    -h, --help          show this help message and exit
    -o OUTPUT.json, --out OUTPUT.json
                        the file where the JSON output should be written
                          (default: write to stdout)
    -c, --couch         output array within a "docs" item in a JSON document
                          for bulk insert to CouchDB via POST to db/_bulk_docs
    -m, --mongo         output individual records as separate JSON objects,
                          one per line for bulk insert to MongoDB via
                          mongoimport utility
    -t ISIS_JSON_TYPE, --type ISIS_JSON_TYPE
                        ISIS-JSON type, sets field structure:
                          1=string, 2=alist, 3=dict
    -q QTY, --qty QTY   maximum quantity of records to read (default=ALL)
    -s SKIP, --skip SKIP  records to skip from start of .mst (default=0)
    -i TAG_NUMBER, --id TAG_NUMBER
                        generate an "_id" from the given unique TAG field
                          number for each record
    -u, --uuid          generate an "_id" with a random UUID for each record
    -p PREFIX, --prefix PREFIX
                        concatenate prefix to every numeric field tag
                          (ex. 99 becomes "v99")
    -n, --mfn           generate an "_id" from the MFN of each record
                          (available only for .mst input)
    -k TAG:VALUE, --constant TAG:VALUE
                        Include a constant tag:value in every record
                          (ex. -k type:AS)


ISIS-JSON Record Types
=======================

There are many ways to represent CDS/ISIS records in JSON [#]_. This
utility currently exports ISIS-JSON types 1, 2 and 3.

Given an ISIS record with this strcuture::

   2 «538886»
  10 «Kanda, Paulo Afonso^1USP^2FMUSP^3CRDC^pBrasil^cSão Paulo^rorg»
  10 «Smidth, Magali Taino^1USP^2FMUSP^3CRDC^pBrasil^cSão Paulo^rorg»

Below are the three supported representations of that record in JSON:

ISIS-JSON type 1
-----------------

::

  {"10":
      ["Kanda, Paulo Afonso^1USP^2FMUSP^3CRDC^pBrasil^cSão Paulo^rorg",
       "Smidth, Magali Taino^1USP^2FMUSP^3CRDC^pBrasil^cSão Paulo^rorg"],
   "2":
      ["538886"]
  }

ISIS-JSON type 2
-----------------

::

    {"10":
        [
            [
                ("_", "Kanda, Paulo Afonso"),
                ("1", "USP"),
                ("2", "FMUSP"),
                ("3", "CRDC"),
                ("p", "Brasil"),
                ("c", "São Paulo"),
                ("r", "org")
            ],
            [
                ("_", "Smidth, Magali Taino"),
                ("1", "USP"),
                ("2", "FMUSP"),
                ("3", "CRDC"),
                ("p", "Brasil"),
                ("c", "São Paulo"),
                ("r", "org")
            ]
        ],
     "2":
        [
            [
                ("_", "538886")
            ]
        ]
    }

ISIS-JSON type 3
-----------------

::

    {"10":
        [
            {
                "_": "Kanda, Paulo Afonso",
                "1": "USP",
                "2": "FMUSP",
                "3": "CRDC",
                "c": "São Paulo",
                "p": "Brasil",
                "r": "org"
            },
            {
                "_": "Smidth, Magali Taino",
                "1": "USP",
                "2": "FMUSP",
                "3": "CRDC",
                "c": "São Paulo",
                "p": "Brasil",
                "r": "org"
            }
        ],
     "2":
        [
            {
                "_": "538886"
            }
        ]
    }


.. [#] See section 4.1 of http://journal.code4lib.org/articles/4893


Dependencies
=============

Under Python, isis2json.py depends on:

- Python2.6 or 2.7
- argparse.py (bundled; also part of the CPython 2.7 distribution)

Under Jython, isis2json.py depends on:

- Jython 2.5;
- argparse.py (bundled)
- Bruma.jar on the CLASSPATH (bundled);
- jyson-1.0.1.jar on the CLASSPATH (bundled);

Example CLASSPATH::

  export CLASSPATH=/home/luciano/lib/Bruma.jar:/home/luciano/lib/jyson-1.0.1.jar


Troubleshooting
================

SyntaxError on `yield fields` running isis2json.py under Jython
-------------------------------------------------------------------

If you see this::

  Traceback (innermost last):
    (no code object) at line 0
    File "./isis2json.py", line 84
          yield fields
              ^
  SyntaxError: invalid syntax

You are probably running Jython 2.2, an old version that is packaged
with several Linux distributions such as Debian and Ubuntu. To verify,
type::

  $ jython --version
  Jython 2.2.1 on java1.6.0_20

To fix, download and install Jython 2.5 or later from the Jython project
on SourceForge_.

.. _SourceForge: http://sourceforge.net/projects/jython/files/jython/

IMPORT ERROR: Jython 2.5 and Bruma.jar are required to read .mst files
-----------------------------------------------------------------------

Check if Jython 2.5 or later is installed::

  $ jython --version
  Jython 2.5.2

If it is not, se issue above. If it is, add the path to Bruma.jar to
the CLASSPATH environment variable, or pass it via the `jython -J-cp`
command line option when running isis2json.py, like this::

  $ jython -J-cp lib/jyson-1.0.1.jar:lib/Bruma.jar isis2json.py fixtures/LILACS1.mst