Run a clone of Amazon's Mechanical Turk service in your local environment.
This tool is meant to be used as a web service running locally on your network or personal machine. It will load HIT template files generated by the Amazon Mechanical Turk web GUI provided to requesters for creating HITs. Input CSV files are also uploaded to create a HIT based on the template with each row of values in the CSV file.
The results of the HITs completed by the workers can be exported in CSV files.
Instead of installing Turkle and dependencies directly, you can run Turkle as a Docker container, using scripts to manage your HIT templates and data. Either build a Turkle image:
docker build --force-rm -t hltcoe/turkle .
or pull the latest from the Docker registry:
docker pull hltcoe/turkle
and start a container with an easy name, and mapping container port 8080 somewhere on the Docker host (e.g. 18080):
docker run -d --name container_name -p 18080:8080 hltcoe/turkle
Your annotator can now browse to that port on the Docker host. To give them something to do, upload an Amazon Turk HIT template and data:
scripts/upload_hit.sh container_name data.csv template.html
At any point, you can download the current state of annotations:
scripts/download_annotations.sh container_name annotation_state.csv
You can upload new data to be annotated, without changing the template:
scripts/upload_hit.sh container_name new_data.csv
Or replace both:
scripts/upload_hit.sh container_name new_data.csv new_template.html
git clone https://github.com/hltcoe/turkle.git
cd turkle
Make sure that the dependencies listed below are met, and then run the commands
python manage.py migrate
python manage.py runserver
TODO: instructions for installing from an extracted bundle that is distributed along with the required eggs.
-
The packages listed in
requirements.txt
. If the packages are not already installed in your environment, and you have an internet connection, then you can run the following commands to install the required Python packages.cd /path/to/clone/of/turkle virtualenv venv source venv/bin/activate pip install -r requirements.txt
Load the URL of the tool (perhaps http://localhost:8000) in your browser. Click on List of HITs, and then start completing the HITs under the Unfinished HITs
To publish new HITs, cd
to the root directory of this server's code
repository and run the command:
python manage.py publish_hits <template_file_path> <csv_file_path>
with <template_file_path>
replaced with the absolute path to the HIT template
file and <csv_file_path>
replaced with the path to the CSV file containing
the data for the individual HITs.
To get the results of the completed HITs, cd
to the root directory of
this server's code repository and run the command:
python manage.py dump_results <template_file_path> <results_csv_file_path>
with:
<template_file_path>
replaced with the absolute path to where the template file was located when the HITs were published. This argument acts as a filter so that only completed HITs from the same template are dumped.<results_csv_file_path>
replaced with the desired path to where the results will be saved. The format is:
- UTF-8 encoding
- a header row for the first line
- one HIT result per line
- values in each line are comma-delimited in the Excel style.