The Kazimirski web application aims to facilitate crowdsourced transcription of scanned pages into digitized text, when OCRs do not produce reliable output (ex: handwritten text, old typography, or a combination of both such as in the Kazimirski dictionary).
- Side-by-side transcription UI embedding Internet Archive's viewer
- Custom text input logic handling specificities of bi-directional text in a dictionary
- Integration of Trix WYSIWYG editor with minimal formatting & server-side markup sanitization
- Simple checkout-submit-review workflow for contributed pages
- Straightforward role system:
- Transcribers can submit pages
- Reviewers can review, correct and accept pages
- Admins can access the management backend
- Dashboard with overview of overall progress
- Email notifications
- Custom CAPTCHA with bi-directional text
git clone git@github.com:francoisbruneau/kazimirski.git
Create a .env file in the project folder and add:
RACK_ENV=development
PORT=5000
CAPTCHA_9ad39f737b804013809c6945dbd23355=answer1
CAPTCHA_ffa971d9f127408b88c47c734b7ddfbd=answer2
CAPTCHA_71074c5ee3ee4b0485ef6a860f55828a=answer3
vagrant up
vagrant ssh
rake db:setup
mailcatcher --http-ip=0.0.0.0
foreman start
Go to http://localhost:5000
Thanks to @noefroidevaux for the base Vagrant/Ansible/Ruby/Rails setup: https://github.com/noefroidevaux/rails-workshop