OpenShakespeareData
This repo converts two datasets into a format that is compatible with AnnotateIt's jQuery plugin to recreate Open Shakespeare. It takes Moby's XML-formatted complete works of Shakespeare and annotation data from Finals Club stored on AnnotateIt.org and puts them on a MongoDb database and most importantly, allows them to work cohesively with version 1.2.6 of the AnnotateIt Plugin.
It can be easily customized and used to migrate this data to another URL or site with a different DOM structure using the Annotator Plugin or even the xpath jQuery library to write your own custom mapping script.
Overview
All the resources for converting the raw data to work cohesively with the AnnotateIt Plugin.
Quick Guide/ReadMe
About the Annotator Plugin See Wiki pages for an in-depth exploration on these topics:
About the Data Sets
1. Convert Works To recreate my process, see these Wiki pages in the following order:
Convert the works of Shakespeare into the expected format
2. Add Works
Add works of Shakespeare into your MongoDb
3. Retrieve Annotations
Retrieve old [Finals Club annotation data](http://annotateit.org/api/search_raw?q=_exists_:finalsclub_id&size=3100&from=0) from [AnnotateIt.org](annotateit.org)
4. Add Annotations
Add the old [Finals Club annotation data](http://annotateit.org/api/search_raw?q=_exists_:finalsclub_id&size=3100&from=0) from [AnnotateIt.org](annotateit.org) to your MongoDb
5. Convert Annotation Schema
Convert the [Finals Club annotation data](http://annotateit.org/api/search_raw?q=_exists_:finalsclub_id&size=3100&from=0) into schema expected by the annotateIt plugin
Quick Guide/ReadMe
The quick and dirty way to get your database set up correctly
Follow in order:
- Add Works
- Add Annotations
- Edit Annotations URI/Ranges
- Convert Annotation Schema
1. Add Works of Shakespeare
See the directions below to Convert into the expected HTML format and add it to your db
-- Add the Shakespeare HTML to your Mongodb database
``` npm install ``` download dependencies:
edit the importShakespeareHtml.js script to connect to your database
importShakespeareHtml.js
var mongoose = require('mongoose');
var fs = require('fs');
//edit the string to refer to your database location
mongoose.connect('mongodb://localhost/open_shakespeare');
//the script expects this schema
var PlaySchema = new mongoose.Schema({
title: String,
uriTitle: String,
html: String
});
var Play = mongoose.model('Play', PlaySchema);
The script will look for any .html files in a folder named html in the same directory. put the importShakespeareHtml.js script in the right directory
I like to copy the html file out of the material_cache/moby/html directory and into a separate folder with the script for organization purposes.
run importShakespeareHtml.js
node importShakespeareHtml.js
2. Add Annotations into DB
mongoimport --db dbname --collection annotations --file annotations.json --jsonArray
Since the data is very large, MongoDb will return an error unless you use --jsonArray. This puts all the JSON data into one big object that will need to be parsed out into individual db entries.
-- This must be done before any of the annotation reformatting scripts can run. Parse MongoDb Array
Change the db location address to use your database by changing the first parameter of the MongoClient.connect() function edit DB location in parseJsonArray.js
parseJsonArray.js
// Retrieve
var MongoClient = require('mongodb').MongoClient;
// Connect to db, edit to match your db address
MongoClient.connect("mongodb://localhost:27017/open_shakespeare", function(err, db) {
if(!err) {
console.log("connected successfully to mongodb://localhost:27017/open_shakespeare");
//on successfully connecting, run updater function
parseAnnotations(db);
} else {
console.error("Error connecting to mongodb://localhost:27017/open_shakespeare");
}
});
Run parseJsonArray.js in the console
node parseJsonArray.js
3. Edit URI and annotation Ranges for annotation dataset
It is really important that you run the updateAnnotationsRangesUri.js script before you run the updateAnnotationsSchema.js script. If you do not want to update the URI or ranges, skip to the edit schema step. Change the dataset's URI and/or the annotation Ranges
Change the db location address to use your database by changing the first parameter of the MongoClient.connect() function edit the script to use your db
```javascript // Retrieve var MongoClient = require('mongodb').MongoClient; updateUriRanges.js
// Connect to db, edit to match your db address
MongoClient.connect("mongodb://localhost:27017/open_shakespeare", function(err, db) {
if(!err) {
console.log("connected successfully to mongodb://localhost:27017/open_shakespeare");
//on successfully connecting, run updater function
updateAnnotationsRangesUri(db);
} else {
console.error("Error connecting to mongodb://localhost:27017/open_shakespeare");
}
});
<h5>Edit the URI/Ranges</h5>
The annotateIt plugin relies directly on the URI and xPath ranges to map the annotation data to the works of Shakespeare. For more information on how this works, see the wiki page: About Annotation Plugin.<br>
<h6>updateUriRanges.js</h6>
```javascript
annotations.find().toArray(function(err, results) {
if(!err) {
results.forEach(function(annotation){
if(annotation.ranges) {
//extract title from URI to make a relative pathname that matches with the Annotorious router
var titleStart = (annotation._source.uri).search('/work') + 6,
title = (annotation._source.uri).slice(titleStart),
uri = '/#works/' + title;
//edit here to create a filepath relative to your DOM structure
var start = '/div[2]/div[1]/div[2]/div[2]' + annotation.ranges[0].start;
var end = '/div[2]/div[1]/div[2]/div[2]' + annotation.ranges[0].end;
annotations.update(
//update the changes in the db.
{'_id': annotation._id},
{
$set: {
'uri': uri,
'ranges.0.start': start,
'ranges.0.end': end
}
},
{safe: true},
function(err, result){
if(!err) {
console.log('Success!');
} else {
console.log('Error updating annotation ranges for %s', annotation._id);
}
}
);
};
});
console.log("Complete!");
} else {
console.error("Error querying annotations collection:", err );
}
});
in the console: ``` node updateRangesUri.js ``` It prints whether there were any errors, successes and when it completes run updateRangesUri.js
4. Edit Annotation Data Schema
If you intend to edit the ranges or URI for this data, you must complete step 3 before step 4.
Change the db location address to use your database by changing the first parameter of the MongoClient.connect() function edit script for your db
```javascript // Retrieve var MongoClient = require('mongodb').MongoClient; updateAnnotationsSchema.js
// Connect to the db edit this string to connect to your db: mongodb://localhost:27017/open_shakespeare
MongoClient.connect("mongodb://localhost:27017/open_shakespeare", function(err, db) {
if(!err) {
console.log("connected successfully to mongodb://localhost:27017/open_shakespeare");
updateAnnotations(db);
} else {
console.error("Error connecting to mongodb://localhost:27017/open_shakespeare");
}
});