Student: Shreyasvi Natraj
This project can be used in order to launch a mechanical turk web instance externally. The project offers a customizable front end wherein a polygon can be drawn on top of a shelter image and a python back end which is intergrated to Amazon Mechanical Turk for deploying it in the form of a Human Intelligence Task on Amazon MTurk website. This project gives the ability generate rapid polygon data over several different refugee shelter satellite images in order to provide accurate figures regarding the amount of resources which is required in those refugee camps.
Presently, this project has been tested on ubuntu 16.06 and must be compatible with windows 10.
1) Install all dependencies
pip install boto==2.46.1 boto3==1.4.4 botocore==1.5.88 matplotlib==2.0.2 numpy==1.12.1 numpydoc==0.6.0 Pillow==4.1.1
2) Get the Site deployed on Firebase and set up Amazon MTurk
- Install firebase using
npm install -g firebase-tools
- Login to Firebase Console
firebase login
- Create a Firebase Project by going to the firebase website.
- Open terminal and move into the
/toWeb
directory - Run
firebase list
in terminal and see the project-id for the project that you just created - Open the script ASCRIPT_INSTALL.py
- Copy and paste the firebaseProjectID inside
ASCRIPT_INSTALL.py
andconfig.ini
under the right field in the file once you have hosted the page on firebase. - Make a folder in
/toWeb/public/images/<your folder name>
and store all the satellite images here - Run the
sudo python ASCRIPT_INSTALL.py
- Set Up your MTurk Requester Account following these instructions. Make sure you complete all steps through Step 5 (setting up the developer sandbox). When making an IAM user, save your AWS Access and Secret Access keys somewhere safe
- Place your preferred username (doesn't matter what it is so long as you are consistent in your code), AWS Access, and AWS Secret Access Keys inside the
config.ini
For example, if you wanted to create access for a user named 'Student' with AWS Key 'ABCD' and AWS Secret Access Key '1234'
[User name] awskey = Your AWS Key Here awssakey = Your AWS Secret Access Key would become
[Student] awskey = ABCD awssakey = 1234
3) Launching MTurk Campaign
- Open the
ASCRIPT_begin.py
andASCRIPT_finish.py
file and change the values for the according to the requirement
to change | value |
---|---|
folderToPublish | name of <your folder name > |
user | add same username as config name |
serverType | set 'production' for live event or developer for sandbox mode |
imagesPerPerson | number of images to be classified as one HIT |
hitBatch | folder in which results will be stored |
- The front end of the website can be changed by editing the HTML and Javascript code inside the
toWeb
folder - When everything is done, run:
sudo python ASCRIPT_begin.py
- Go online to a link posted by the begin script when it is publishing hits. It will take you to the developer sandbox where you can see exactly what your HIT would look like to an actual user without having to pay them for it. Do a couple hits so you can test with data.
4) Getting Results from MTurk
- When the campaign time ends, run
-
Run
ASCRIPT_finish.py
and see some statistics about your HIT. -
If no crashes occur, you have finished! Explore the images stored inside the data folder
-
The annotations for each image submitted to you will be stored in JSON Format (1 line per image) inside
all_submitted.txt
, a text file inside your specific hit batch folder. Learn more about the JSON format here -
The JSON for the condensed images will also be stored in
condensed_all_submitted.txt
The format of .json file is as follows:{"assignmentID": "33SA9F9TRY0RTFNCJH46WGAOSCCWES", "objs": [{"data": [[[11, 1, 11, 16]], [[32, 18, 12, 23]]], "type": "polygon", "name": "building"}], "annotations": ["building"], "fileName": "muna_camp/Muna_clipped338.JPG"}
-
A condensed image combines the annotations of everyone who annotated the same image into one line of JSON. For example, if three people annotated powerplants on top of an image independently, a condensed image would have all three polygons in the same line of JSON
-
All condensed JSON is stored in
condensed_all_submitted.txt
-
A visual represntation of each condensed image, where each annotated feature is drawn, is stored in
allSubmittedCondensedImages
. The image for the previously mentioned .json file can be represented with the following image:
5) Accepting/Rejecting HITs:
- After running
sudo python ASCRIPT_finish.py
:
- Run
sudo python ASCRIPT_hit_checker.py
script with the proper variables and accept or reject annotations. The hitcheker: - Reads through each line of
all_submitted.txt
- Generates an image of the annotations given the JSON data
- Displays the image in a window and asks you to accept or reject the annotation
- Accepts all assignments containing an image that you accepted and rejects those that contained no image that you accepted once you close out the hit_checker GUI. --folderNote: The only way to reject an assignment and not provide that Turker compensation is if you reject every image that they annotated. If you offered them 10 images to annotate and you rejected 9 of them, you will still pay them the full compensation for the 1 image you accepted.
- Writes the line of JSON for each accepted annotation into
accepted.txt
- Rerun
sudo python ASCRIPT_finish.py
- The program will download new unreviewed data for hit checker to process
- Creates condensed images using only accepted data.
- The JSONS are stored in
condensed_accepted.txt
- The visual representations are stored in acceptedCondensedImages
- Will begin populating data folder with confidence maps for each image and annotated object using only accepted data Inside data, a folder will be created for each image annotated and it will be named the same as the image without the file extension.
- Inside the image's folder will be a raw and normalized confidence map for each object annotated
- The raw map contains, at every pixel, the number of people who thought this was part of a feature
- The normalized map contains, at every point, the relative confidence of a point being part of a feature with the highest relative confidence having a value of 255, visualized as white, and the lowest having 0, visualized as black. Note that this is a relative confidence. An image that has a maximum of two people annotating any given point as a feature will show 2 as 255. Another image that has a max of 20 people annotating any given point will show 20 as 255. The normalized map is meant as an easy visualization. For data processing purposes, we recommend using the raw map.