util.py check_weights does not support all keyword averaging schemes from main.ipynb
cnsetzer opened this issue · 5 comments
In main.ipynb
, during the investigation of the mock classifier systematics, the weight schemes
schemes = ['flat', 'up', 'down', 'per_class', 'per_item']
include those which are not supported in the util.py
implementation of check_weights
.
if type(avg_info) != str:
avg_info = np.asarray(avg_info)
weights = avg_info / np.sum(avg_info)
assert(np.isclose(sum(weights), 1.))
elif avg_info == 'per_class':
weights = np.ones(M) / float(M)
elif avg_info == 'per_item':
classes, counts = np.unique(truth, return_counts=True)
weights = np.zeros(M)
weights[classes] = counts / float(len(truth))
assert len(weights) == M
return weights
Do we need to add cases for the other weighting schemes?
I thought of adding them, but because they're sort of contrived tests, I made them vectors of weights so they're covered by the first case. On the other hand, the proclam code isn't really for an audience beyond our contrived tests, since it's not optimized for the actual challenge, so it might make sense to add them in if they'll be used in the science-specific metrics paper after the challenge ends. Do you want to add support for them (and possibly give them more descriptive names)? Either way, I'm going to rename the issue.
Ah, there was also a related bug in main.ipynb itself! I'm not sure if that was what you meant, but I fixed that as well.
Ahh ok, no that was not what I was referring to. I am going to look at this now.
I see the difficulty due to the contrived tests... However then shouldn't we change how this is being used in the notebook itself?
I have added some basic implementation for the flat, up, down schemes. However, in the absence of having the information of the "chosen" systematic, the up-weighted or down-weighted weight is randomly selected from the possible different weights. This is my commit changing the utils.py
file:
51b65d6
I'm not sure what kinds of meaningful tests we can do with weights in the absence of any physics. The weights are going to affect all metrics the same way because they're just constant factors on the terms contributing to them, i.e. other linear combinations of the same stuff, unless we include them in a more sophisticated way I wanted to investigate but didn't get to in time for Kaggle.
I enabled a keyword argument for the "chosen" class in the function you modified so it only defaults to random if none is provided. I'm going to close the issue because I think it's resolved by these changes.