/sosol

The Son of Suda On Line

Primary LanguageRuby

Quick Start

Requirements

  • Java SE 8 (1.8+)
  • JRuby (1.7.12+) - tested through JRuby 1.7.22. Preferably managed through rbenv.
  • Bundler (1.10.6+) (gem install bundler)
  • Git (1.7.1+)

Initial setup steps for development

  • get source from GitHub: git clone https://github.com/sosol/sosol.git
  • copy in/create config/environments/*_secret.rb (where * is development, test, production - sets config.rpx_api_key and config.rpx_realm, can get these by creating an RPX account), e.g.:
Sosol::Application.configure do
  config.rpx_api_key = '0cdf56769f20a2b73a929ac3ba633152'
  config.rpx_realm = 'sosol-development'
end
rbenv install
gem install bundler
bundle install
bundle exec cap local externals:setup
bundle exec rake db:migrate
bundle exec rake git:db:canonical:clone
bundle exec rake test
bundle exec rails server

App Structure

File Structure

app/                  application code
  controllers/        controller code
  helpers/            shared helper code
  models/             model code
  views/              view code/templates
config/               application configuration
  boot.rb             Rails boot script
  database.yml        Rails database configuration
  deploy.rb           Capistrano deployment file
  environment.rb      Rails environment script (defines gem deps, etc.)
  environments/       environment-specific override scripts (i.e. for production, development, testing)
  externals.yml       Capistrano external definitions, used for freezing external dependencies
  hgv.yml
  initializers/       Rails initializer scripts
  locales/            Rails i18n locales
  routes.rb           Rails route configuration
  warble.rb           Warbler configuration, for packing app into .war file
data/                 application data
  lookup/             XML lookup files for HGV
  templates/          data templates, e.g. for new documents
  xslt/               XSLT data
    biblio/           XSLT for BiblioIdentifier
    common/           Shared XSLT files
    ddb/              XSLT for DDBIdentifier
    epidoc/           Capistrano-defined external XSLT from EpiDoc example-p5-xslt
    metadata/         XSLT for HGVMetaIdentifier
    pn/               PN XSLT
      navigator/      Capistrano-defined external XSLT from Navigator
    translation/      XSLT for HGVTransIdentifier
db/                   database data (SQLite for development/testing, Git repositories)
  git/                development/production Git repositories
  migrate/            Rails database migration scripts
  test/               test environment git repositories
doc/                  generated documentation
lib/                  shared application libraries
  git_conf.rb         allows setting per-branch Git repositories for development
  java/               Java JAR files, usable by JRuby
  jruby_xml.rb        JRubyXML class for abstracting Java XML libraries
  linking_info.rb     LinkingInfo class
  maintenance_mode.rb MaintenanceMode module, enables cap deploy:web:enable/disable
  numbers_rdf.rb      NumbersRDF module, for interaction with Numbers Server
  rpx.rb              Rpx module, for RPX authentication
  tasks/              Additional rake task definitions
log/                  running application log directory
public/               static web application data (error pages, JS, CSS, images)
  flash/              Flash files
  images/             image files
  javascripts/        JS files
  stylesheets/        CSS files
    sass/             SASS source files for generated CSS
test/                 test code
  fixtures/           test fixtures
  functional/         functional tests
  integration/        integration tests
  unit/               unit tests
tmp/                  temporary data
vendor/               3rd-party code (i.e. frozen gems)

Design Overview

The Son of Suda On Line (SoSOL) is one of the main components of the Integrating Digital Papyrology project (IDP), aiming to provide a repurposable web-based editor for the digital resources in the DDbDP and HGV. SoSOL integrates a number of technologies to provide a truly next-generation online editing environment. Using JRuby with the Rails web framework, it is able to take advantage of Rails’s wide support in the web development community, as well as Java’s excellent XML libraries and support. This includes the use of XSugar to define an alternate, lightweight syntax for EpiDoc XML markup, called Leiden+. Because XSugar uses a single grammar to define both syntaxes in a reversible and bidirectional manner, this is ideal for reducing side effects of transforming text in our version-controlled system. SoSOL uses the Git distributed version control system as its versioning backend, allowing it to use the powerful branching and merging strategies it provides, and enabling fully-auditable version control. SoSOL also provides for editorial control of changes to the main data repository, enabling the democracy of allowing anyone to change anything they choose while preserving the academic integrity of canonical published data.

Next-Generation Version Control

Many online editing environments, such as MediaWiki, use an SQL database as the sole mechanism for storing revisions. This can lead to a number of problems, such as scaling (most SQL servers are not performance optimized for large text fields) and distribution of data (see for example the database downloads of the English Wikipedia, which have been notoriously problematic for obtaining the full revision history). Most importantly, they typically impose a centralized, linear, single-branch version history. Because Git is a distributed version control system, it does not impose any centralized workflow. As a result, branching and merging have been given high priority in its development, allowing for much more concurrent editing activity while minimizing the difficulty of merging changes. SoSOL’s use of Git is to have one “canonical” Git repository for public, approved data and to which commits are restricted. Users and boards each get their own Git repositories which act as forks of the canonical repository. This allows them to freely make changes to their repository while preserving the version history as needed when these changes are merged back into the canonical repository. These repositories can also be easily mirrored, downloaded, and worked with offline and outside of SoSOL due to the distributed nature of Git. This enables a true democracy of data, wherein institutions still retain control and approval of the data which they put their names on, but any individual may easily obtain the full dataset and revision history to edit, contribute to, and republish under the terms of license.

Alternative Syntax for XML Editing

While XML encoding has many advantages, users inexperienced with its use may find its syntax difficult or verbose. It is still desirable to harness the expertise of these users in other areas and ease their ability to add content to the system, while retaining the semantically explicit nature of XML markup. To do this, we have used XSugar to allow the definition of a “tagless” syntax for EpiDoc XML, which resembles that of the traditional printed Leiden conventions for epigraphic and papyrological texts where possible. Structures which are semantically ambiguous or undefined in Leiden but available in EpiDoc (e.g. markup of numbers and their corresponding value) have been given additional text markup, referred to comprehensively as Leiden+. XSugar enables the definition of this syntax in a single, bidirectional grammar file which defines all components of both Leiden+ and EpiDoc XML as correspondences, which can be statically checked for reversibility and validity. This provides much more rigorous guarantees of these properties than alternatives such as using separate XSLT stylesheets for each direction of the transform, as well as encoding the relation between the components of each syntax in a single location.

Repurposable Design

Due to institutional requirements, the DDbDP and HGV datasets needed separate editorial control and publishing mechanisms. In addition, their control over different types of content necessitated different editing mechanisms for each component. These requirements informed the design of how SoSOL interacts with data under its control and how this design is repurposable for use in other projects. The two high-level abstractions of data made by SoSOL are “publications” and “identifiers”. Identifiers are unique strings which can be mapped to a specific file path in the repository, while publications are arbitrary aggregations of identifiers. By defining an identifier superclass which defines common functionality for interacting with the data repository, we can then subclass this to provide functionality specific to a given category of data. The SoSOL implementation for IDP2, for example, provides identifier subclasses for DDbDP transcriptions, HGV metadata, and HGV translations. Editorial boards consequently have editorial control for only certain subclasses of identifiers. Publications in turn allow representation and aggregation of the complex many-to-many relationships these components can have (for example, a document with two sides that may have one transcription and two metadata components). Packaging these related elements together both allows the user to switch between them and editorial boards to check related data which they may not have editorial control over but still require to make informed decisions about validity and approval. SoSOL can thus be integrated into other systems by implementing the identifier subclasses necessary for the given dataset as well as coherent means for aggregating these components into publications.

JRuby Deployment

WAR-based deployment

  • use warbler to bundle as a standard .war (includes JRuby, gems, etc.) to upload to an app server

    • initialize/migrate your production DB using RAILS_ENV=production jruby -S rake db:migrate (change production line in config/database.yml for your setup)
    • may want to add config.logger = Logger.new(STDOUT) to config/environment.rb to get Rails logging out to servlet container
    • run cap local externals:setup to fetch externals
    • use jruby -S warble war to build the war
    • need to change REPOSITORY_ROOT in config/environment.rb to an absolute shared path, because packing up the 500MB+ canonical repo in the war will grind things to a halt
  • war unpacking isn't guaranteed (and multiple deploys would overwrite changes to the unpacked dir structure anyway) so it seems like alternate config will be needed

    • need a persistent directory for git (overwrite REPOSITORY_ROOT and CANONICAL_REPOSITORY in production)
    • DB migrations seem impossible to run from the deployed war context, so you'll have to run them pre-deploy using the production environment on some checkout of the code (meaning it will also need access to your production DB server credentials)
    • not sure on logs, sessions (may need extra config for new vers of Rails), rollback (probably impossible to get easily in a war context due to also needing to reverse migrations)

Directory-based deployment

  • deploy an unpacked directory and run within JRuby using any variety of servers (e.g. jetty-rails, glassfish - N.B. glassfish gem must have all required jars invoked inside JRuby on the classpath e.g. lib/java/saxon9he.jar) which can be accessed directly or proxied

Repository Setup

Canonical Repository (from Git)

  • Initialize with:
    • git clone --bare git://github.com/papyri/idp.data.git CANONICAL_REPOSITORY
    • where CANONICAL_REPOSITORY is the path defined in config/environment.rb, defaulting to REPOSITORY_ROOT/canonical.git. For WAR-based deploy will want to change REPOSITORY_ROOT to an absolute path as noted.

Administration

  • jruby script/console production should give you a console for the production environment (keep in mind things like DB/Git access, you'll need to run this from a location that can access those resources, the source, and the correct configs for your production environment)
  • from here you can give users admin privileges by e.g.:
me = User.find_by_name('rfbaumann')
me.admin = true
me.save!
  • can completely reset the database with jruby -S rake RAILS_ENV=production db:reset
    • note that this ''won't'' run AR callbacks, so you need to clear user/board git repos:

      • if you plan on '''keeping''' canonical (aka repo 3) i.e. not deleting it and recloning from the Git SVN copy, you need to first ensure that it has all finalized objects and remove the user repos you'll delete from its alternates file:

        • cd REPOSITORY_ROOT/canonical.git && git repack && echo "" > objects/info/alternates
        • can check integrity afterwards with git fsck
      • rm -rf REPOSITORY_ROOT/boards/*.git REPOSITORY_ROOT/users/*.git

    • can also delete every model instance from the console with e.g. User.find(:all).each {|u| u.destroy} but this can be tedious if there are loose instances of other models that aren't dependent-destroyed

XSugar Standalone

  • to start a local XSugar standalone server:
    • in vendor/plugins/rxsugar/src/standalone run mvn:install and mvn:jetty

Tomcat setup

  • jruby -S rake RAILS_ENV=production db:migrate
  • jruby -S rake RAILS_ENV=production db:structure:dump
  • CREATE DATABASE sosol
  • GRANT all on database sosol to 'sosoladmin'@'localhost' identified by 's0s0ladm1n'
  • GRANT all on database sosol to 'sosoladmin'@'%' identified by 's0s0ladm1n'
  • cat production_structure.sql | mysql -u sosoladmin -p sosol