/turkle

Django-based clone of Amazon's Mechanical Turk service running in your local environment.

Primary LanguagePython

Run a clone of Amazon's Mechanical Turk service in your local environment.

This tool is meant to be used as a web service running locally on your network or personal machine. It will load HIT template files generated by the Amazon Mechanical Turk web GUI provided to requesters for creating HITs. Input CSV files are also uploaded to create a HIT based on the template with each row of values in the CSV file.

The results of the HITs completed by the workers can be exported in CSV files.

Installation

Docker approach

Instead of installing Turkle and dependencies directly, you can run Turkle as a Docker container, using scripts to manage your HIT templates and data. Either build a Turkle image:

docker build --force-rm -t hltcoe/turkle .

or pull the latest from the Docker registry:

docker pull hltcoe/turkle

and start a container with an easy name, and mapping container port 8080 somewhere on the Docker host (e.g. 18080):

docker run -d --name container_name -p 18080:8080 hltcoe/turkle

Your annotator can now browse to that port on the Docker host. To give them something to do, upload an Amazon Turk HIT template and data:

scripts/upload_hit.sh container_name data.csv template.html

At any point, you can download the current state of annotations:

scripts/download_annotations.sh container_name annotation_state.csv

You can upload new data to be annotated, without changing the template:

scripts/upload_hit.sh container_name new_data.csv

Or replace both:

scripts/upload_hit.sh container_name new_data.csv new_template.html

Initial setup

git clone https://github.com/hltcoe/turkle.git
cd turkle

Make sure that the dependencies listed below are met, and then run the commands

python manage.py migrate
python manage.py runserver

TODO: instructions for installing from an extracted bundle that is distributed along with the required eggs.

Dependencies

  • The packages listed in requirements.txt. If the packages are not already installed in your environment, and you have an internet connection, then you can run the following commands to install the required Python packages.

    cd /path/to/clone/of/turkle
    virtualenv venv
    source venv/bin/activate
    pip install -r requirements.txt

Using it

Worker instructions

Load the URL of the tool (perhaps http://localhost:8000) in your browser. Click on List of HITs, and then start completing the HITs under the Unfinished HITs

Requester instructions

Publish HITs

To publish new HITs, cd to the root directory of this server's code repository and run the command:

python manage.py publish_hits <template_file_path> <csv_file_path>

with <template_file_path> replaced with the absolute path to the HIT template file and <csv_file_path> replaced with the path to the CSV file containing the data for the individual HITs.

Get results

To get the results of the completed HITs, cd to the root directory of this server's code repository and run the command:

python manage.py dump_results <template_file_path> <results_csv_file_path>

with:

  • <template_file_path> replaced with the absolute path to where the template file was located when the HITs were published. This argument acts as a filter so that only completed HITs from the same template are dumped.
  • <results_csv_file_path> replaced with the desired path to where the results will be saved. The format is:
  • UTF-8 encoding
  • a header row for the first line
  • one HIT result per line
  • values in each line are comma-delimited in the Excel style.