aimalz/proclam

RH Additional comments on papers/figures

reneehlozek opened this issue · 3 comments

  • Make the plots of CM have more dynamic range
  • label the axes, since it isn't clear
  • make sure to comment on the figures from snmachine, and discuss the fact that the templates didn't actually do to well on the final challenge
  • check that the metric also satisfies triangle inequality
  • ordering 'wrongness' of classifications?
  1. I'm open to another colormap but want to be careful not to suggest that the values of the cells are not scalars between 0 and 1 by having a gradient involving more than two colors, a pitfall of all my go-to colormaps.
  2. Rick said there were five classes, but Michelle's data has only three. I didn't want to assume that they were Ia/II/Ibc solely on the basis that I see those in that order most frequently. And are you asking for this for the synthetic classification results as well? I skipped it because the class labels are arbitrary and ordered, but I can do it if you think it helps.
  3. I have all the same questions as the reviewers when it comes to the SNPhotCC content. Help with that would be appreciated! Also, why do you say that the templates don't do well? Table 2 shows that some of the most successful classifiers are the template ones. If the assessment is based on the values Michelle provided that led to a backwards trend for the log-loss, I was unable to reproduce those numbers using the proclam code so recomputed them using the same pipeline as for the synthetic classification probabilities. (There was a sign error propagated from an earlier paper draft, but, based on the notebook she shared, I think something might have been wrong with the indicator variable.)
  4. Do you mean with respect to the weights or something else?
  5. I think Table 2 addresses this, though probably not in the best way. Are you suggesting doing the same for the synthetic classification results? Also, I attempted to do this visually in the DESC seminar (see slide below) and could show something like that (but a little less hastily thrown together), if you think it's more informative. (I usually find tables in papers to be uninformative, but I personally got even less out of the plot with classifiers on the x-axis so figured a table might be appropriate.)
    plasticc_pubboard_talk

Hi @aimalz

  1. I'm open to another colormap but want to be careful not to suggest that the values of the cells are not scalars between 0 and 1 by having a gradient involving more than two colors, a pitfall of all my go-to colormaps.
    Yep, I'm making the colorbar scale logarithmic for the same colormap.

Actually, this question came up while looking at the classifications. The 'noisy' doesn't look noisy at all, just slightly less perfect. Why isn't is like this:

screen shot 2018-08-20 at 11 06 59 pm

which I generated with:

cm = np.eye(M_classes) + 0.1*np.random.rand(M_classes,M_classes) * np.ones((M_classes, M_classes))

  1. Rick said there were five classes, but Michelle's data has only three. I didn't want to assume that they were Ia/II/Ibc solely on the basis that I see those in that order most frequently. And are you asking for this for the synthetic classification results as well? I skipped it because the class labels are arbitrary and ordered, but I can do it if you think it helps.

Labels will help - even purely numerical. I'll implement this.

  1. I have all the same questions as the reviewers when it comes to the SNPhotCC content. Help with that would be appreciated! Also, why do you say that the templates don't do well?

Michelle's paper didn't actually take part in the challenge - the results she gives are for the representative subset of the SNPhotCC data post-challenge. Templates did do well (I mistyped), it's the wavelets that didnt handle non-representativity. But all this needs to be included.

  1. Do you mean with respect to the weights or something else?
    Well, this came up when talking to a colleague. If the log los is defined as a distance, we need to comment that it really is one - which would be required to ensure that the metric was meaningful to compare classes. I think this is true, but language on this will help.

5 > I think Table 2 addresses this, though probably not in the best way. Are you suggesting doing the same for the synthetic classification results? Also, I attempted to do this visually in the DESC seminar (see slide below) and could show something like that (but a little less hastily thrown together), if you think it's more informative. (I usually find tables in papers to be uninformative, but I personally got even less out of the plot with classifiers on the x-axis so figured a table might be appropriate.)

The papers and figure captions need more description!

  1. So we had random numbers in them at some point before at the hack stage and determined that it creates a bias. The PMFs drawn from the conditional probability matrix with random components share covariances where the random numbers were higher, so it's not just noisy but also subsumed, thereby not effectively isolating the systematics. (And is yours normalized properly?)

However, in hacks.ipynb, I did try plotting a handful of PMFs drawn from each conditional probability matrix, and that does look more like your plot. I'd like to add some version of that to the paper to hopefully address your point. My instinct was to add in one PMF drawn from each true class, but I don't want it to look too much like the conditional probability matrices. Do you have any ideas for a better way to show those?

Also, I didn't think the log scaling would make much of a difference in dynamic range, but it might help with the SNPhotCC matrices. However, for our toy matrices, the scale would have to go from 10^-8 to 10^0, highlighting our awkward but unavoidable choice of a non-zero value of zero for the sake of numerical stability.

  1. I don't object to adding labels if they're helpful for anyone else's understanding and only omitted them because I found them to be uninformative clutter, particularly since they're repeated on all panels (repetitive axis/tick labels are a pet peeve of mine) and have no physical meaning. I don't recall the class labels coming up in the review feedback so erroneously assumed everyone else agreed. Sorry!

  2. I think the core of my misunderstanding is about the relevance of which classification methods do well or not. Since the paper is about metrics, not classifiers, it should only matter whether the metrics agree or disagree (if these classifiers had entered SNPhotCC, would all metrics identify the same winner?) and how that relates to the mixes of the isolated systematics conditional probability matrices. I did comment out the text about non-representativity of the training set because my understanding was that the probability vectors from those methods were representative of real classifiers like those that might enter PLAsTiCC in contrast with the isolated systematics taken one at a time, and not included to highlight representativity of the training set, which, like the identity of the classification methods, seems unrelated to the point about metrics. Perhaps seeing a comparison with non-representative training data, which should induce some different systematics, would be interpretable in the context of metrics. Is there even time for that?

  3. In what way is the log-loss a distance? When we introduced it, Rahul and I tried to emphasize that it's a measure of information, whereas the Brier score is more like a distance in the space of probability and thus somewhat less interpretable even before introducing weights. (I did mean to add in mention of how weights affect the interpretation of both, though, and will address that today.) In any case, the Brier score looks like it satisfies the triangle inequality but I wouldn't know how to interpret that in the context of probabilities. It sounds like you understand how to interpret it, so maybe if you can walk me through that, I can flesh it out in the text.

  4. Absolutely, I'll make that a priority today!