Efficient Annotation of Scalar Labels (EASL)

Last update: June 5, 2018

preprint

Pre-requisites

Python 3.x
numpy (pip install numpy)

Procedure

Create your project by sh create_project.sh YOUR_PROJECT_NAME

Let's assume our project name is political.

(e.g., sh create_project.sh political)
Prepare data (csv file) in ./experiments/political/XYZ.csv

Your data should be formatted in a csv file, consisting of (at least) the columns of id, sent.
```
      e.g., 
      id,sent
      1,obama is a legend in his own mind
      2,conservatives are racists
      3,cruz is correct
      4,romney is president
      5,obama thinks there are 57 states
      ...
```
Let's assume the file name is political.csv

Note: You can add additional columns. For example, if you want to annotate on a pair of sentences such as premise and hypothesis, the columns will look like id, premise, hypothesis)

Run python initialize.py ./experiments/political/political.csv to set initial parameters (alpha, beta, mode, var)

The result csv file (political_0.csv) should be as follows.
```
  e.g., 
  id,sent,alpha,beta,mode,var
  1,obama is a legend in his own mind,1,1,0.5,0.0833
  2,conservatives are racists,1,1,0.5,0.0833
  3,cruz is correct,1,1,0.5,0.0833
  4,romney is president,1,1,0.5,0.0833
  5,obama thinks there are 57 states,1,1,0.5,0.0833
  ...
```
It has additional columns: alpha, beta, mode, var.

In this example, we place the file at ./experiments/political/political_0.csv.

Prepare your HIT template in ./templates/political/

See an example at ./templates/political/template_political.html.

Now, we are ready to start annotation with EASL!
Generate HITs

We generate our HITs by running the following command.
```
 python main.py --operation generate --model ./experiments/political/political_0.csv --hits 25
```
This generates political_hit_1.csv that has 25 HITs.

The number of HITs (per iteration) should depend on your data size. (See python main.py --help for more details.)
Publish the HITs (with the template file.)
Collect the result and name it ./experiments/political/political_result_1.csv.
Update the model
```
 python main.py --operation update --model ./experiments/political/political_0.csv
```
This takes political_0.csv, political_result_1.csv, and then generate political_1.csv.
Go back to the step 3 (Generate HITs) and iterate the procedure.

(e.g., If this is the first iteration with political_0.csv, use political_1.csv to generate HITs in the next iteration.)

ahoho/EASL

Efficient Annotation of Scalar Labels (EASL)

Pre-requisites

Procedure