HazyResearch/metal

Checking on Basic Test Case

CatalinVoss opened this issue · 3 comments

Now that I'm finally looking at using your gen. model again, I'm just running some basic sanity checks and am confused about the output:

model = LabelModel(2)

# Label matrix: [documents x LFs]
label_matrix = np.array([[0, 0, 0],  # This doc should be labeled as 0? --> returns 2
                         [1, 1, 1],  # ... as 1?
                         [1, 0, 1]]) # ... as something in between?

model.train_model(label_matrix, cardinality=1)
model.predict(label_matrix)

This returns [1,1,1]. When I expand the label matrix to something larger, I get back int64's in the range of [1,2] and they don't make any sense:

label_matrix = np.array([[0, 0, 0],  # This doc should be labeled as 0? --> returns 2
                         [0, 0, 0],  # ... as 0? --> returns 2
                         [0, 0, 0],  # ... as 0? --> returns 2
                         [0, 0, 0],  # ... as 0? --> returns 2
                         [0, 0, 0],  # ... as 0? --> returns 2
                         [0, 0, 0],  # ... as 0? --> returns 1
                         [0, 0, 0],  # ... as 0? --> returns 2
                         [0, 0, 0],  # ... as 0? --> returns 2
                         [0, 0, 0],  # ... as 0? --> returns 1
                         [0, 0, 0],  # ... as 0? --> returns 1
                         [0, 0, 0],  # ... as 0?
                         [0, 0, 0],  # ... as 0?
                         [0, 0, 0],  # ... as 0?
                         [0, 0, 0],  # ... as 0?
                         [1, 1, 1],  # ... as 1?
                         [1, 0, 1]]) # ... as something in between?

I was expecting probabilistic labels between 0 and 1.

Haven't dug any deeper. This is using your most recent version on pip (sometime Dec).

See the Basics tutorial (https://github.com/HazyResearch/metal/blob/master/tutorials/Basics.ipynb) or the MeTaL Commandments (https://github.com/HazyResearch/metal/blob/master/docs/metal_design.md#classifiers) for a description of the basic predicting methods of all MeTaL classifiers:

predict_proba() yields an [n,k] matrix of soft labels (floats)
predict() yields an [n,] vector of hard labels (ints)

So why does 2 get introduced as a label here? Is that just a rounding error?

In MeTaL, all labels are categorical (so a binary problem has class 1 and 2, not -1 and 1), and the label 0 is always reserved to mean an abstain vote or an unknown label, never a normal class.