/HYPE-C

Primary LanguagePythonMIT LicenseMIT

HYPE-C: Evaluating Image Completion Models Through Standardized Crowdsourcing

This repository is the official implementation of HYPE-C: Evaluating Image Completion Models Through Standardized Crowdsourcing.

Requirements and Setup

Management and collection of HYPE-C evaluations requires a working local MongoDB installation. Installation instructions for MongoDB can be found here.

Clone the repository using:

git clone --recursive git@github.com:princeton-vl/HYPE-C.git

A HYPE-C Conda environment can be created via:

conda env create -f requirements.yaml

Finally, HYPE-C requires a Mechanical Turk requester account and uses Amazon S3 for data storage.

Configuration

Amazon

An Amazon AWS access key and secret key is required to manage HITs. New access keys can be created through the AWS console. If using an IAM user with limited permissions, the user must have access to both Amazon Mechanical Turk as well as Amazon S3.

Place both the access and secret keys in a file named config.json, written as:

{
  "aws_access_key": "XXXXXXXXX",
  "aws_secret_key": "XXXXXXXXX"
}

Qualification and HIT Properties

Qualification tests are configured using a .json file with the template:

{
  "dataset_properties": {
    "real_dir": "../datasets/qualification_sets/real",
    "fake_dir": "../datasets/qualification_sets/fake",
    "bucket_name": "mybucket",
    "bucket_region": "us-east-1"
  },
  "test_properties": {
    "Name": "Image Label Qualification Test",
    "Description": "Decide whether each image is a real photograph or a computer generated fake.",
    "Keywords": "image,images,label,labeling,classify,classification,test,qualification",
    "QualificationTypeStatus": "Active",
    "TestDurationInSeconds": 1200
  }
}

Similarly, evaluations are configured using the template:

{
  "dataset_properties": {
    "real_dir": "../datasets/eval/real",
    "fake_dir": "../datasets/eval/fake",
    "bucket_name": "mybucket",
    "bucket_region": "us-east-1"
  },
  "hit_properties": {
    "Title": "Determine whether the images are real or fake",
    "Description": "Decide whether each image is real or a computer generated fake.",
    "Keywords": "image,images,label,labeling,classify,classification",
    "Reward": 0.05,
    "LifetimeInSeconds": 604800,
    "AssignmentDurationInSeconds": 900,
    "FrameHeight": 20000,
    "MaxAssignments": 1,
    "QualificationId": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
    "QualificationComparator": ">",
    "QualificationInteger": 64
  },
  "approval_properties": {
    "correct_bonus": 0.02
  }
}
Property Description
real_dir Directory containing real, unmodified images for use in a qualification test or evaluation.
fake_dir Directory containing partially synthetic images for use in a qualification test or evaluation.
bucket_name Name of the S3 bucket to store anonymized images.
bucket_region The region of the S3 bucket.
correct_bonus The bonus given to evaluators for each correct answer.
QualificationId Must be set to the ID of a qualification launched using launch_qualification_test.py.

A description of all other properties can be found here and here.

Hit UI Templates

Hit UI templates define the UI of the AMT HIT. The basic UI used for the HYPE-C baseline evaluations is included in the hit_templates directory. Information on modifying the template can be found here and here.

Using HYPE-C

Script Example Description
launch_qualification_test.py python launch_qualification_test.py --config=config.json --qual_properties=my_test.json Launches a new qualification test and returns the qualification ID. Launches to the sandbox by default; use the --prod flag to launch to production.
launch_evaluation.py python launch_evaluation.py --config=config.json --eval_properties=my_eval.json --eval_name="My Evaluation" --html_template=hit_templates/hypec_label.html Launches a new evaluation. Launches to the sandbox by default; use the --prod flag to launch to production.
get_eval_results.py python get_eval_results.py --config=config.json Gets evaluation results and updates local database. Should be run before using approve_eval_hits.py. Collects results from the sandbox by default; use the --prod flag to collect production results.
approve_eval_hits.py python approve_eval_hits.py --config=config.json Approves all valid evaluations and disburses bonuses to workers who submitted correct answers. Invalid or duplicate evaluations are rejected. Approves results from the sandbox by default; use the --prod flag to approve production results.
auto_qual_workers.py python auto_qual_workers.py --config=config.json Automatically disqualifies workers who have submitted an evaluation for a specific dataset from performing another evaluation using the same dataset. Prevents duplicate evaluations. Only works with workers from the sandbox by default; use the --prod flag to monitor production workers.
compute_scores.py python compute_scores.py Reports HYPE-C scores for all evaluations. Reports only results from the sandbox by default; use the --prod flag to report production results.
db_setup.py python db_setup.py Creates indices for the HYPE-C MongoDB database.