Jingkang50/OpenOOD

Some questions about "Full-Spectrum Out-of-Distribution Detection“

Closed this issue · 2 comments

Hello, I've noticed your recent publication on the paper titled "Full-Spectrum Out-of-Distribution Detection", and it has greatly inspired me.
However, I couldn't find relevant pretraining weights for COVID. Additionally, I observed that the FS-OOD detection introduced in openood v1.5 did not utilize the benchmarks proposed in the mentioned paper, but instead, it was tested on imagenet200 and imagenet-1k. I'm curious about the reasoning behind this choice.

Dear Esther,

Thank you for your great question!
Actually, the FS-OOD was a work that can be traced back to mid-2021 (although it was eventually accepted recently), when the benchmarking of ood detection was not unified and comprehensive, so we took time in the hope of building various and good benchmarks. The COVID benchmark is also created with many efforts. You can get the data here. The good aspect of the dataset is that it could clarify what is OOD detection in the medical domain and make the definition of OOD detection conceptually clear. The bad aspect of this benchmark is that the task (binary classification) seems easy, and the black-and-white images might be too simple (even simpler than MNIST) that the modern networks can easily overfit.

The OpenOOD series are more recent and ongoing works that make unified and meaningful OOD detection benchmarking. As you might notice that in v1 we still have MNIST benchmarks, but now we already get rid of them and turn to larger and more 'mainstream' datasets like ImageNet. We want to give the message to the community that we shall tackle difficult, more generalizable tasks (since you might be able to get good results on the small dataset with many non-generalizable tricks) so we decide to only keep the selected dataset, and tons of experiments based on them.

In sum, although we actually spent a lot of time building the benchmarks in FS-OOD, to better formulate the OOD benchmarks that are useful to their largest extent, we finalize the selection of the current benchmark but keep the concept of full spectrum.

Thank you