Checking on Basic Test Case
CatalinVoss opened this issue · 3 comments
Now that I'm finally looking at using your gen. model again, I'm just running some basic sanity checks and am confused about the output:
model = LabelModel(2)
# Label matrix: [documents x LFs]
label_matrix = np.array([[0, 0, 0], # This doc should be labeled as 0? --> returns 2
[1, 1, 1], # ... as 1?
[1, 0, 1]]) # ... as something in between?
model.train_model(label_matrix, cardinality=1)
model.predict(label_matrix)
This returns [1,1,1]
. When I expand the label matrix to something larger, I get back int64's in the range of [1,2] and they don't make any sense:
label_matrix = np.array([[0, 0, 0], # This doc should be labeled as 0? --> returns 2
[0, 0, 0], # ... as 0? --> returns 2
[0, 0, 0], # ... as 0? --> returns 2
[0, 0, 0], # ... as 0? --> returns 2
[0, 0, 0], # ... as 0? --> returns 2
[0, 0, 0], # ... as 0? --> returns 1
[0, 0, 0], # ... as 0? --> returns 2
[0, 0, 0], # ... as 0? --> returns 2
[0, 0, 0], # ... as 0? --> returns 1
[0, 0, 0], # ... as 0? --> returns 1
[0, 0, 0], # ... as 0?
[0, 0, 0], # ... as 0?
[0, 0, 0], # ... as 0?
[0, 0, 0], # ... as 0?
[1, 1, 1], # ... as 1?
[1, 0, 1]]) # ... as something in between?
I was expecting probabilistic labels between 0 and 1.
Haven't dug any deeper. This is using your most recent version on pip (sometime Dec).
See the Basics tutorial (https://github.com/HazyResearch/metal/blob/master/tutorials/Basics.ipynb) or the MeTaL Commandments (https://github.com/HazyResearch/metal/blob/master/docs/metal_design.md#classifiers) for a description of the basic predicting methods of all MeTaL classifiers:
predict_proba() yields an [n,k] matrix of soft labels (floats)
predict() yields an [n,] vector of hard labels (ints)
So why does 2
get introduced as a label here? Is that just a rounding error?
In MeTaL, all labels are categorical (so a binary problem has class 1 and 2, not -1 and 1), and the label 0 is always reserved to mean an abstain vote or an unknown label, never a normal class.