/OpenShakespeareData

Scripts on converting Moby's XML formatted Shakespeare works and Finals Club's annotations data to work with the Annotator plugin using Mongodb/Mongoose datastore.

OpenShakespeareData

alt text

This repo converts two datasets into a format that is compatible with AnnotateIt's jQuery plugin to recreate Open Shakespeare. It takes Moby's XML-formatted complete works of Shakespeare and annotation data from Finals Club stored on AnnotateIt.org and puts them on a MongoDb database and most importantly, allows them to work cohesively with version 1.2.6 of the AnnotateIt Plugin.

It can be easily customized and used to migrate this data to another URL or site with a different DOM structure using the Annotator Plugin or even the xpath jQuery library to write your own custom mapping script.

alt textOverview

All the resources for converting the raw data to work cohesively with the AnnotateIt Plugin.

Quick Guide/ReadMe

See Wiki pages for an in-depth exploration on these topics:

About the Annotator Plugin
About the Data Sets

To recreate my process, see these Wiki pages in the following order:

1. Convert Works
Convert the works of Shakespeare into the expected format
2. Add Works
Add works of Shakespeare into your MongoDb
3. Retrieve Annotations
Retrieve old [Finals Club annotation data](http://annotateit.org/api/search_raw?q=_exists_:finalsclub_id&size=3100&from=0) from [AnnotateIt.org](annotateit.org)
4. Add Annotations
Add the old [Finals Club annotation data](http://annotateit.org/api/search_raw?q=_exists_:finalsclub_id&size=3100&from=0) from [AnnotateIt.org](annotateit.org) to your MongoDb
5. Convert Annotation Schema
Convert the [Finals Club annotation data](http://annotateit.org/api/search_raw?q=_exists_:finalsclub_id&size=3100&from=0) into schema expected by the annotateIt plugin

alt textQuick Guide/ReadMe

The quick and dirty way to get your database set up correctly

Follow in order:

  1. Add Works
  2. Add Annotations
  3. Edit Annotations URI/Ranges
  4. Convert Annotation Schema

alt text1. Add Works of Shakespeare

See the directions below to Convert into the expected HTML format and add it to your db

Add the Shakespeare HTML to your Mongodb database

--
download dependencies:
``` npm install ```
edit the importShakespeareHtml.js script to connect to your database
importShakespeareHtml.js
    var mongoose = require('mongoose');
    var fs = require('fs');
    
    //edit the string to refer to your database location
    mongoose.connect('mongodb://localhost/open_shakespeare');
    
    //the script expects this schema
    var PlaySchema = new mongoose.Schema({
      title: String,
      uriTitle: String,
      html: String
    });
    
    var Play = mongoose.model('Play', PlaySchema);
put the importShakespeareHtml.js script in the right directory
The script will look for any .html files in a folder named html in the same directory.
I like to copy the html file out of the material_cache/moby/html directory and into a separate folder with the script for organization purposes.
run importShakespeareHtml.js
    node importShakespeareHtml.js

alt text2. Add Annotations into DB

    mongoimport --db dbname --collection annotations --file annotations.json --jsonArray

Since the data is very large, MongoDb will return an error unless you use --jsonArray. This puts all the JSON data into one big object that will need to be parsed out into individual db entries.

Parse MongoDb Array

-- This must be done before any of the annotation reformatting scripts can run.
edit DB location in parseJsonArray.js
Change the db location address to use your database by changing the first parameter of the MongoClient.connect() function
parseJsonArray.js
    // Retrieve
    var MongoClient = require('mongodb').MongoClient;
    
    // Connect to db, edit to match your db address
    MongoClient.connect("mongodb://localhost:27017/open_shakespeare", function(err, db) {
      if(!err) {
        console.log("connected successfully to mongodb://localhost:27017/open_shakespeare");
        //on successfully connecting, run updater function
        parseAnnotations(db);
      } else {
        console.error("Error connecting to mongodb://localhost:27017/open_shakespeare");
      }
    });

Run parseJsonArray.js in the console

node parseJsonArray.js

alt text3. Edit URI and annotation Ranges for annotation dataset

Change the dataset's URI and/or the annotation Ranges

It is really important that you run the updateAnnotationsRangesUri.js script before you run the updateAnnotationsSchema.js script. If you do not want to update the URI or ranges, skip to the edit schema step.
edit the script to use your db
Change the db location address to use your database by changing the first parameter of the MongoClient.connect() function
updateUriRanges.js
```javascript // Retrieve var MongoClient = require('mongodb').MongoClient;
// Connect to db, edit to match your db address
MongoClient.connect("mongodb://localhost:27017/open_shakespeare", function(err, db) {
  if(!err) {
    console.log("connected successfully to mongodb://localhost:27017/open_shakespeare");
    //on successfully connecting, run updater function
    updateAnnotationsRangesUri(db);
  } else {
    console.error("Error connecting to mongodb://localhost:27017/open_shakespeare");
  }
});

<h5>Edit the URI/Ranges</h5>
The annotateIt plugin relies directly on the URI and xPath ranges to map the annotation data to the works of Shakespeare. For more information on how this works, see the wiki page: About Annotation Plugin.<br>


<h6>updateUriRanges.js</h6>

```javascript
  annotations.find().toArray(function(err, results) {
    if(!err) {
      results.forEach(function(annotation){
        if(annotation.ranges) {

          //extract title from URI to make a relative pathname that matches with the Annotorious router
          var titleStart = (annotation._source.uri).search('/work') + 6,
          title = (annotation._source.uri).slice(titleStart),
          uri = '/#works/' + title;

          //edit here to create a filepath relative to your DOM structure
          var start = '/div[2]/div[1]/div[2]/div[2]' + annotation.ranges[0].start;
          var end = '/div[2]/div[1]/div[2]/div[2]' + annotation.ranges[0].end;
          annotations.update(
            //update the changes in the db.
            {'_id': annotation._id},
            {
              $set: {
                'uri': uri,
                'ranges.0.start': start,
                'ranges.0.end': end
              }
            }, 
            {safe: true}, 
            function(err, result){
              if(!err) {
                console.log('Success!');
              } else {
                console.log('Error updating annotation ranges for %s', annotation._id);
              }
            }
          );
        };
      });
    console.log("Complete!");
    } else {
      console.error("Error querying annotations collection:", err );
    }
  });
run updateRangesUri.js
in the console: ``` node updateRangesUri.js ``` It prints whether there were any errors, successes and when it completes

alt text4. Edit Annotation Data Schema

If you intend to edit the ranges or URI for this data, you must complete step 3 before step 4.

edit script for your db
Change the db location address to use your database by changing the first parameter of the MongoClient.connect() function
updateAnnotationsSchema.js
```javascript // Retrieve var MongoClient = require('mongodb').MongoClient;
// Connect to the db edit this string to connect to your db: mongodb://localhost:27017/open_shakespeare
MongoClient.connect("mongodb://localhost:27017/open_shakespeare", function(err, db) {
  if(!err) {
    console.log("connected successfully to mongodb://localhost:27017/open_shakespeare");
    updateAnnotations(db);
  } else {
    console.error("Error connecting to mongodb://localhost:27017/open_shakespeare");
  }
});