/papers

Summaries of papers on machine learning, computer vision etc.

About

I categorize, annotate and write comments for all research papers I read as a PhD student.

Index




All Papers:

Papers Read in 2022:

[22-03-03] [paper197]
Quite interesting and well-written paper. I did however find it difficult to properly understand everything, it feels like a lot of details are omitted (I wouldn't really know how to actually implement this in practice). It's difficult for me to judge how impressive the results are or how practically useful this approach actually might be, what limitations are there? Overall though, it does indeed seem quite interesting.
[22-03-02] [paper196]
Somewhat interesting paper. They use a softmax model with MC-dropout to compute uncertainty estimates. The evaluation is not very extensive, they mostly just check that the classification accuracy improves as they reject more and more samples based on a uncertainty threshold.
[22-02-26] [paper195]
Quite interesting and well-written paper. It seemed quite niche at first, but I think their analysis could potentially be useful.
[22-02-26] [paper194]
Quite interesting and well-written paper. Two simple modifications of the "maximum softmax score" baseline, and the performance is consistently improved. The input perturbation method is quite interesting. Intuitively, it's not entirely clear to me why it actually works.
[22-02-25] [paper193]
Interesting and well-written paper. Interesting that Mahalanobis works very well on the CIFAR10 vs SVHN but not on the medical imaging dataset. I don't quite get how/why the ODIN method works, I'll probably have to read that paper.
[22-02-25] [paper192]
Quite interesting and well-written paper. The definition of "prediction depth" in Section 2.1 makes sense, and it definitely seems reasonable that this could correlate with example difficulty / prediction confidence in some way. Section 3 and 4, and all the figures, contain a lot of info it seems, I'd probably need to read the paper again to properly understand/appreciate everything.
[22-02-24] [paper191]
Interesting and well-written paper. I wasn't very familiar with CT image reconstruction, but they do a good job explaining everything. Interesting that MC-dropout seems important for getting well-calibrated predictions.
[22-02-21] [paper190]
Quite interesting and well-written paper. They compare MC-dropout, ensemlbing and mixup (and with a standard softmax classifer as the baseline). Nothing groundbreaking, but the studied application (classification of pathology slides for cancer) is very interesting. The FPR95 metrics for OOD detection in Table 4 are terrible for ensembling, but the classification accuracy (89.7) is also pretty much the same as for D_test_int in Tabe 3 (90.1)? So, it doesn't really matter that the model isn't capable of distinguishing this "OOD" data from in-distribution? 
[22-02-21] [paper189]
Somewhat interesting paper. I didn't quite understand everything, so it could be more interesting than I think. The fact that their pseudo-input generation process "relies on the availability of a differentiable density estimate of the data" seems like a big limitation? For regression, they only applied their method to very low-dimensional input data (1D toy regression and UCI benchmarks), but would this work for image-based tasks?
[22-02-19] [paper188]
  • Contrastive Training for Improved Out-of-Distribution Detection [pdf] [annotated pdf]
  • Jim Winkens, Rudy Bunel, Abhijit Guha Roy, Robert Stanforth, Vivek Natarajan, Joseph R. Ledsam, Patricia MacWilliams, Pushmeet Kohli, Alan Karthikesalingam, Simon Kohl, Taylan Cemgil, S. M. Ali Eslami, Olaf Ronneberger
  • 2020-07-10
  • [Out-of-Distribution Detection]
Quite interesting and very well-written paper. They take the method from the Mahalanobis paper ("A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks") (however, they fit Gaussians only to the features at the second-to-last network layer, and they don't use the input pre-processing either) and consistently improve OOD detection performance by incorporating contrastive training. Specifically, they first train the network using just the SimCLR loss for a large number of epochs, and then also add the standard classification loss. I didn't quite get why the label smoothing is necessary, but according to Table 2 it's responsible for a large portion of the performance gain.
[22-02-19] [paper187]
Well-written and interesting paper. The proposed method is simple and really neat: fit class-conditional Gaussians in the feature space of a pre-trained classifier (basically just LDA on the feature vectors), and then use the Mahalanobis distance to these Gaussians as the confidence score for input x. They then also do this for the features at multiple levels of the network and combine these confidence scores into one. I don't quite get why the "input pre-processing" in Section 2.2 (adding noise to test samples) works, in Table 1 it significantly improves the performance.
[22-02-19] [paper186]
Quite interesting and well-written paper. Only experiments on a toy 1D regression problem, and flight delay prediction in which the input is 8D. The approach of just adding noise to the input x to get OOD samples would probably not work very well e.g. for image-based problems?
[22-02-18] [paper185]
  • Does Your Dermatology Classifier Know What It Doesn't Know? Detecting the Long-Tail of Unseen Conditions [pdf] [annotated pdf]
  • Abhijit Guha Roy, Jie Ren, Shekoofeh Azizi, Aaron Loh, Vivek Natarajan, Basil Mustafa, Nick Pawlowski, Jan Freyberg, Yuan Liu, Zach Beaver, Nam Vo, Peggy Bui, Samantha Winter, Patricia MacWilliams, Greg S. Corrado, Umesh Telang, Yun Liu, Taylan Cemgil, Alan Karthikesalingam, Balaji Lakshminarayanan, Jim Winkens
  • 2021-04-08, Medical Image Analysis (January 2022)
  • [Out-of-Distribution Detection] [Medical ML]
Well-written and interesting paper. Quite long, so it took a bit longer than usual to read it. Section 1 and 2 gives a great overview of OOD detection in general, and how it can be used specifically in this dermatology setting. I can definitely recommend reading Section 2 (Related work). They assume access to some outlier data during training, so their approach is similar to the "Outlier exposure" method (specifically in this dermatology setting, they say that this is a fair assumption). Their method is an improvement of the "reject bucket" (add an extra class which you assign to all outlier training data points), in their proposed method they also use fine-grained classification of the outlier skin conditions. Then they also use an ensemble of 5 models, and also a more diverse ensemble (in which they combine models trained with different representation learning techniques). This diverse ensemble obtains the best performance.
[22-02-16] [paper184]
Interesting and well-written paper. The proposed method makes intuitive sense, trying to incorporate the "OOD training" method (i.e., to use some kind of OOD data during training, similar to e.g. the "Deep Anomaly Detection with Outlier Exposure" paper) into the Bayesian deep learning approach. The experimental results do seem quite promising.
[22-02-15] [paper183]
Well-written and interesting paper. Short paper of just 3 pages, but with an extensive appendix which I definitely recommend going through. The method, training an ensemble and then applying the Laplace approximation to each network, is very simple and intuitively makes a lot of sense. I didn't realize that this would have basically the same test-time speed as ensembling (since they utilize that probit approximation), that's very neat. It also seems to consistently outperform ensembling a bit across almost all tasks and metrics.
[22-02-15] [paper182]
Well-written and somewhat interesting paper. I'm not overly familiar with RL, which makes it a bit difficult for me to properly evaluate the paper's contributions. They use standard ensembles for uncertainty estimation combined with an OOD sampling regularization. I thought that the OOD sampling could be interesting, but it seems very specific to RL. I'm sure this paper is quite interesting for people doing RL, but I don't think it's overly useful for me.
[22-02-15] [paper181]
Quite interesting and very well-written paper, I enjoyed reading it. Their analysis of fitting Gaussian regression models via the NLL is quite interesting, I didn't really expect to learn something new about this. I've seen Gaussian models outperform standard regression (L2 loss) w.r.t. accuracy in some applications/datasets, and it being the other way around in others. In the first case, I've then attributed the success of the Gaussian model to the "learned loss attenuation". The analysis in this paper could perhaps explain why you get this performance boost only in certain applications. Their beta-NLL loss could probably be quite useful, seems like a convenient tool to have.
[22-02-15] [paper180]
Well-written and somewhat interesting paper. I'm not overly familiar with reinforcement learning, which makes it a bit difficult for me to properly evaluate the paper's contributions, but to me it seems like fairly straightforward method modifications? To use ensembles of Gaussian models (instead of ensembles of models trained using the L2 loss) makes sense. The BIV method I didn't quite get, it seems rather ad hoc? I also don't quite get exactly how it's used in equation (10), is the ensemble of Gaussian models trained _jointly_ using this loss? I don't really know if this could be useful outside of RL.
[22-02-14] [paper179]
Interesting and very well-written paper, I enjoyed reading it. I still think that ensembling probably is quite difficult to beat purely in terms of uncertainty estimation quality, but this definitely seems like a useful tool in many situations. It's not clear to me if the analytical expression for regression in "4. Approximate Predictive Distribution" is applicable also if the variance is input-dependent?
[22-02-12] [paper178]
Well-written and interesting paper. They synthetically create dataset shifts (e.g. by adding Gaussian noise to the data) of increasing intensity and study whether or not the uncertainty increases as the accuracy degrades. They compare regular softmax, temperature scaling, MC-dropout, ensembling and a simple variational inference method. Their conclusion is basically that ensembling slightly outperforms the other methods, but that no method performs overly well. I think these type of studies are really useful.
[22-02-12] [paper177]
Well-written and interesting paper. This is a good paper to read before "Natural Posterior Network: Deep Bayesian Predictive Uncertainty for Exponential Family Distributions". Their proposed method seems to have similar / slightly worse performance than a small ensemble, so the only real advantage is that it's faster at time-time? This is of course very important in many applications, but not in all. The performance also seems quite sensitive to the choice of lambda in the combined loss function (Equation (10)), according to Figure S2 in the appendix?
[22-02-11] [paper176]
Well-written and quite interesting paper. A short paper, just 4 pages. They don't study the method from the "Energy-based Out-of-distribution Detection" paper as I had expected, but it was still a quite interesting read. The results in Section 4.2 seem interesting, especially for experiment 3, but I'm not sure that I properly understand everything.
[22-02-10] [paper175]
Interesting and well-written paper. I didn't quite understand all the details, I'll have to read a couple of related/background papers to be able to properly appreciate and evaluate the proposed method. I definitely feel like I would like to read up on this family of methods. Extensive experimental evaluation, and the results seem promising overall.
[22-02-09] [paper174]
Interesting and well-written paper. The proposed method is quite clearly explained and makes intuitive sense (at least if you're familiar with EBMs). Compared to using the softmax score, the performance does seem to improve consistently. Seems like fine-tuning on an "auxiliary outlier dataset" is required to get really good performance though, which you can't really assume to have access to in real-world problems, I suppose?
[22-02-09] [paper173]
Interesting and quite well-written paper. I did find it somewhat difficult to understand certain parts though, they could perhaps be explained more clearly. The results seem quite impressive (they do consistently outperform all baselines), but I find it interesting that the "Gaussian noise" baseline in Table 2 performs that well? I should probably have read "Energy-based Out-of-distribution Detection" before reading this paper.

Papers Read in 2021:

[21-12-16] [paper172]
Very interesting and quite well-written paper. Kind of neat/fun to see state-space models being used. The experimental results seem very impressive!? I didn't fully understand everything in Section 3. I had to read Section 3.4 a couple of times to understand how the parameterization actually works in practice (you have H state-space models, one for each feature dimension, so that you can map a sequence of feature vectors to another sequence of feature vectors) (and you can then also have multiple such layers of state-space models, mapping sequence --> sequence --> sequence --> ....).
[21-12-09] [paper171]
Quite interesting and well-written paper. Quite a heavy read, probably need to be rather familiar with GPs to properly understand/appreciate everything. Definitely check Appendix D, it gives a better understanding of how the proposed method is applied in practice. I'm not quite sure how strong/impressive the experimental results actually are. Also seems like the method could be a bit inconvenient to implement/use?
[21-12-03] [paper170]
Interesting and very well-written paper. Gives a good overview of the field and contains a lot of seemingly useful references. The evaluation is very comprehensive. The user study is quite neat.
[21-12-02] [paper169]
Quite well-written paper overall that seemed interesting, but I found it very difficult to properly understand everything. Thus, I can't really tell how interesting/significant their analysis actually is.
[21-11-25] [paper168]
Quite interesting and well-written paper. The experimental results do seem promising. However, I don't quite get why the proposed method intuitively makes sense, why is it better to only use the parameters of the final network layer?
[21-11-18] [paper167]
  • Masked Autoencoders Are Scalable Vision Learners [pdf] [annotated pdf]
  • Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick
  • 2021-11-11
Interesting and well-written paper. The proposed method is simple and makes a lot of intuitive sense, which is rather satisfying. After page 4, there's mostly just detailed ablations and results.
[21-11-11] [paper166]
Quite well-written and somewhat interesting paper. I'm not very familiar with this area. I didn't spend too much time trying to properly evaluate the significance of the findings.
[21-10-28] [paper165]
  • Deep Classifiers with Label Noise Modeling and Distance Awareness [pdf] [annotated pdf]
  • Vincent Fortuin, Mark Collier, Florian Wenzel, James Allingham, Jeremiah Liu, Dustin Tran, Balaji Lakshminarayanan, Jesse Berent, Rodolphe Jenatton, Effrosyni Kokiopoulou
  • 2021-10-06
  • [Uncertainty Estimation]
Quite interesting and well-written paper. I find the distance-awareness property more interesting than modelling of input/class-dependent label noise, so the proposed method (HetSNGP) is perhaps not overly interesting compared to the SNGP baseline.
[21-10-21] [paper164]
Somewhat interesting paper. The phenomena observed in Figure 1, that validation accuracy suddenly increases long after almost perfect fitting of the training data has been achieved is quite interesting. I didn't quite understand the datasets they use (binary operation tables).
[21-10-14] [paper163]
  • Learning to Simulate Complex Physics with Graph Networks [pdf] [code] [annotated pdf]
  • Alvaro Sanchez-Gonzalez, Jonathan Godwin, Tobias Pfaff, Rex Ying, Jure Leskovec, Peter W. Battaglia
  • 2020-02-21, ICML 2020
Quite well-written and somewhat interesting paper. Cool application and a bunch of neat videos. This is not really my area, so I didn't spend too much time/energy trying to fully understand everything.
[21-10-12] [paper162]
Interesting and very well-written paper, I really enjoyed reading it! The paper also gives a good understanding of neural implicit representations in general.
[21-10-08] [paper161]
Well-written and quite interesting paper. I read it mainly as background for "Hierarchical Kinematic Probability Distributions for 3D Human Shape and Pose Estimation from Images in the Wild" which is written by exactly the same authors. In this paper, they predict a single Gaussian distribution for the pose (instead of hierarchical matrix-Fisher distributions). Also, they mainly focus on the body shape. They also use silhouettes + 2D keypoint heatmaps as input (instead of edge-filters + 2D keypoint heatmaps).
[21-10-08] [paper160]
Well-written and farily interesting paper. I read it mainly as background for "Hierarchical Kinematic Probability Distributions for 3D Human Shape and Pose Estimation from Images in the Wild" which is written by exactly the same authors. In this paper, they just use direct regression. They also use silhouettes + 2D keypoint heatmaps as input (instead of edge-filters + 2D keypoint heatmaps).
[21-10-07] [paper159]
Well-written and quite interesting paper. I didn't fully understand everything though, and it feels like I probably don't know this specific setting/problem well enough to fully appreciate the paper. 
[21-10-07] [paper158]
Well-written and very interesting paper, I enjoyed reading it. The hierarchical distribution prediction approach makes sense and consistently outperforms the independent baseline. Using matrix-Fisher distributions makes sense. The synthetic training framework and the input representation of edge-filters + 2D keypoint heatmaps are both interesting.
[21-10-06] [paper157]
Well-written and interesting paper. Quite easy to read and follow, the method is clearly explained and makes intuitive sense.
[21-10-04] [paper156]
Well-written and fairly interesting paper. The marker-based representation, instead of using skeleton joints, makes sense. The recursive projection scheme also makes sense, but seems very slow (2.27 sec/frame)? I didn't quite get all the details for their DCT representation of the latent space.
[21-10-03] [paper155]
Interesting and very well-written paper, I really enjoyed reading it. Interesting combination of implicit representations and 3D human modelling. The "inclusive human modelling" application is neat and important.
[21-10-03] [paper154]
Well-written and interesting paper, I enjoyed reading it. Neat application of implicit representations. The paper also gives a quite good overview of online 3D reconstruction in general.
[21-10-02] [paper153]
Well-written and quite interesting paper. The main idea, using a learned conditional prior p(z|c) instead of just p(z), makes sense and was shown beneficial also in "HuMoR: 3D Human Motion Model for Robust Pose Estimation". I'm however somewhat confused by their specific implementation in Section 4, doesn't seem like a standard cVAE implementation?
[21-10-01] [paper152]
Well-written and quite interesting paper. Interesting application, being able to reconstruct full 3D scenes from sparse point clouds. I didn't fully understand everything, as I don't have a particularly strong graphics background.
[21-09-29] [paper151]
  • Information Dropout: Learning Optimal Representations Through Noisy Computation [pdf] [annotated pdf]
  • Alessandro Achille, Stefano Soatto
  • 2016-11-04
Well-written and somewhat interesting paper overall. I'm not overly familiar with the topics of the paper, and didn't fully understand everything. Some results and insights seem quite interesting/neat, but I'm not sure exactly what the main takeaways should be, or how significant they actually are.
[21-09-24] [paper150]
Well-written and fairly interesting paper. Quite a lot of details on the attention architecture, which I personally don't find overly interesting. The experimental results are quite impressive, but I would like to see a comparison in terms of computational cost at test-time. It sounds like their method is rather slow.
[21-09-23] [paper149]
Well-written and quite interesting paper. The general idea, refining frame-by-frame pose estimates via physical constraints, intuitively makes a lot of sense. I did however find it quite difficult to understand all the details in Section 3.
[21-09-21] [paper148]
Very well-written and quite interesting paper, I enjoyed reading it. Everything is quite well-explained, it's relatively easy to follow. The paper provides a good overview of the out-of-distribution detection problem and current methods.
[21-09-17] [paper147]
Quite interesting paper, but also quite strange/confusing. I don't think the proposed method is explained particularly well, at least I found it quite difficult to properly understand what they actually are doing.

In the end it seems like they are learning a global loss function that is very similar to doing probabilistic regression with a Gauss/Laplace model of p(y|x) (with learned mean and variance)? See Figure 4 in the Appendix.

And while it's true that their performance is much better than for direct regression with an L2/L1 loss (see e.g. Table 1), they only compare with Gauss/Laplace probabilistic regression once (Table 7) and in that case the Laplace model is actually quite competitive?
[21-09-15] [paper146]
Extremely well-written and interesting paper. I really enjoyed reading it, and I would recommend anyone interested in computer vision to read it as well.

All parts of the proposed method are clearly explained and relatively easy to understand, including the volume rendering techniques which I was unfamiliar with.
[21-09-08] [paper145]
  • Revisiting the Calibration of Modern Neural Networks [pdf] [code] [annotated pdf]
  • Matthias Minderer, Josip Djolonga, Rob Romijnders, Frances Hubis, Xiaohua Zhai, Neil Houlsby, Dustin Tran, Mario Lucic
  • 2021-06-15, NeurIPS 2021
  • [Uncertainty Estimation]
Well-written paper. Everything is quite clearly explained and easy to understand. Quite enjoyable to read overall. 

Thorough experimental evaluation. Quite interesting findings.
[21-09-02] [paper144]
  • Differentiable Particle Filtering via Entropy-Regularized Optimal Transport [pdf] [code] [annotated pdf]
  • Adrien Corenflos, James Thornton, George Deligiannidis, Arnaud Doucet
  • 2021-02-15, ICML 2021
[21-09-02] [paper143]
[21-08-27] [paper142]
[21-06-19] [paper141]
[21-06-19] [paper140]
  • Expressive Body Capture: 3D Hands, Face, and Body from a Single Image [pdf] [code] [annotated pdf]
  • Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed A. A. Osman, Dimitrios Tzionas, Michael J. Black
  • 2019-04-11, CVPR 2019
  • [3D Human Pose Estimation]
Very well-written and quite interesting paper. Gives a good understanding of the SMPL model and the SMPLify method.
[21-06-18] [paper139]
  • Keep it SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image [pdf] [annotated pdf]
  • Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter Gehler, Javier Romero, Michael J. Black
  • 2016-07-27, ECCV 2016
  • [3D Human Pose Estimation]
[21-06-18] [paper138]
[21-06-17] [paper137]
[21-06-17] [paper136]
[21-06-16] [paper135]
[21-06-16] [paper134]
[21-06-15] [paper133]
[21-06-14] [paper132]
  • 3D Multi-bodies: Fitting Sets of Plausible 3D Human Models to Ambiguous Image Data [pdf] [annotated pdf]
  • Benjamin Biggs, Sébastien Ehrhadt, Hanbyul Joo, Benjamin Graham, Andrea Vedaldi, David Novotny
  • 2020-11-02, NeurIPS 2020
  • [3D Human Pose Estimation]
[21-06-04] [paper131]
[21-05-07] [paper130]
[21-04-29] [paper129]
[21-04-16] [paper128]
  • Learning Mesh-Based Simulation with Graph Networks [pdf] [code] [annotated pdf]
  • Tobias Pfaff, Meire Fortunato, Alvaro Sanchez-Gonzalez, Peter W. Battaglia
  • 2020-10-07, ICLR 2021
[21-04-09] [paper127]
[21-04-01] [paper126]
[21-03-26] [paper125]
  • Your GAN is Secretly an Energy-based Model and You Should use Discriminator Driven Latent Sampling [pdf] [pdf with comments]
  • Tong Che, Ruixiang Zhang, Jascha Sohl-Dickstein, Hugo Larochelle, Liam Paull, Yuan Cao, Yoshua Bengio
  • 2020-03-12, NeurIPS 2020
  • [Energy-Based Models]
[21-03-19] [paper124]
[21-03-12] [paper123]
  • Unsupervised Learning of Visual Features by Contrasting Cluster Assignments [pdf] [code] [pdf with comments]
  • Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, Armand Joulin
  • 2020-06-17, NeurIPS 2020
[21-03-04] [paper122]
[21-02-26] [paper121]
  • Neural Relational Inference for Interacting Systems [pdf] [code] [pdf with comments]
  • Thomas Kipf, Ethan Fetaya, Kuan-Chieh Wang, Max Welling, Richard Zemel
  • 2018-02-13, ICML 2018
[21-02-19] [paper120]
  • Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision [pdf] [pdf with comments]
  • Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig
  • 2021-02-11, ICML 2021
[21-02-12] [paper119]
[21-02-05] [paper118]
[21-01-29] [paper117]
  • No MCMC for Me: Amortized Sampling for Fast and Stable Training of Energy-Based Models [pdf] [code] [pdf with comments]
  • Will Grathwohl, Jacob Kelly, Milad Hashemi, Mohammad Norouzi, Kevin Swersky, David Duvenaud
  • 2020-10-08, ICLR 2021
  • [Energy-Based Models]
[21-01-22] [paper116]
[21-01-15] [paper115]
  • Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention [pdf] [pdf with comments]
  • Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, François Fleuret
  • 2020-06-29, ICML 2020
  • [Transformers]

Papers Read in 2020:

[20-12-18] [paper114]
  • Score-Based Generative Modeling through Stochastic Differential Equations [pdf] [code] [pdf with comments]
  • Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, Ben Poole
  • 2020-11-26, ICLR 2021
  • [Neural ODEs]
[20-12-14] [paper113]
[20-11-27] [paper112]
  • Rethinking Attention with Performers [pdf] [pdf with comments]
  • Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, Afroz Mohiuddin, Lukasz Kaiser, David Belanger, Lucy Colwell, Adrian Weller
  • 2020-10-30, ICLR 2021
  • [Transformers]
[20-11-23] [paper111]
[20-11-13] [paper110]
[20-11-06] [paper109]
[20-10-16] [paper108]
[20-10-09] [paper107]
[20-09-24] [paper106]
[20-09-21] [paper105]
[20-09-11] [paper104]
  • Gated Linear Networks [pdf] [pdf with comments] [comments]
  • Joel Veness, Tor Lattimore, David Budden, Avishkar Bhoopchand, Christopher Mattern, Agnieszka Grabska-Barwinska, Eren Sezener, Jianan Wang, Peter Toth, Simon Schmitt, Marcus Hutter
  • 2020-06-11
[20-09-04] [paper103]
[20-06-18] [paper102]
[20-06-12] [paper101]
[20-06-05] [paper100]
[20-05-27] [paper99]
[20-05-10] [paper98]
[20-04-17] [paper97]
[20-04-09] [paper96]
[20-04-03] [paper95]
[20-03-27] [paper94]
[20-03-26] [paper93]
[20-03-09] [paper92]
[20-02-28] [paper91]
[20-02-21] [paper90]
[20-02-18] [paper89]
[20-02-15] [paper88]
[20-02-14] [paper87]
[20-02-13] [paper86]
[20-02-08] [paper85]
[20-01-31] [paper84]
[20-01-24] [paper83]
[20-01-20] [paper82]
[20-01-17] [paper81]
[20-01-16] [paper80]
[20-01-15] [paper79]
[20-01-14] [paper78]
[20-01-10] [paper77]
[20-01-08] [paper76]
[20-01-06] [paper75]

Papers Read in 2019:

[19-12-22] [paper74]
[19-12-20] [paper73]
[19-12-20] [paper72]
[19-12-19] [paper71]
[19-12-15] [paper70]
[19-12-14] [paper69]
[19-12-13] [paper68]
[19-11-29] [paper67]
[19-11-26] [paper66]
[19-11-22] [paper65]
[19-10-28] [paper64]
[19-10-18] [paper63]
  • Improving Variational Inference with Inverse Autoregressive Flow [pdf] [code] [pdf with comments] [comments]
  • Diederik P. Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, Max Welling
  • 2016-06-15, NeurIPS2016
[19-10-11] [paper62]
[19-10-04] [paper61]
[19-07-11] [paper60]
  • Part-A^2 Net: 3D Part-Aware and Aggregation Neural Network for Object Detection from Point Cloud [pdf] [pdf with comments] [comments]
  • Shaoshuai Shi, Zhe Wang, Xiaogang Wang, Hongsheng Li
  • 2019-07-08
[19-07-10] [paper59]
[19-07-03] [paper58]
[19-06-12] [paper57]
[19-06-12] [paper56]
[19-06-05] [paper55]
  • LaserNet: An Efficient Probabilistic 3D Object Detector for Autonomous Driving [pdf] [pdf with comments] [comments]
  • Gregory P. Meyer, Ankit Laddha, Eric Kee, Carlos Vallespi-Gonzalez, Carl K. Wellington
  • 2019-03-20, CVPR2019
[19-05-29] [paper54]
  • Attention Is All You Need [pdf] [pdf with comments] [comments]
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin
  • 2017-06-12, NeurIPS2017
[19-04-05] [paper53]
  • Stochastic Gradient Descent as Approximate Bayesian Inference [pdf] [pdf with comments] [comments]
  • Stephan Mandt, Matthew D. Hoffman, David M. Blei
  • 2017-04-13, Journal of Machine Learning Research 18 (2017)
[19-03-29] [paper52]
  • Generating High Fidelity Images with Subscale Pixel Networks and Multidimensional Upscaling [pdf] [pdf with comments] [comments]
  • Jacob Menick, Nal Kalchbrenner
  • 2018-12-04, ICLR2019
[19-03-15] [paper51]
[19-03-11] [paper50]
[19-03-04] [paper49]
[19-03-01] [paper48]
[19-02-27] [paper47]
[19-02-25] [paper46]
  • Evaluating model calibration in classification [pdf] [code] [pdf with comments] [comments]
  • Juozas Vaicenavicius, David Widmann, Carl Andersson, Fredrik Lindsten, Jacob Roll, Thomas B. Schön
  • 2019-02-19, AISTATS2019
[19-02-22] [paper45]
  • Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks [pdf] [pdf with comments] [comments]
  • Sanjeev Arora, Simon S. Du, Wei Hu, Zhiyuan Li, Ruosong Wang
  • 2019-01-24
[19-02-17] [paper44]
[19-02-14] [paper43]
[19-02-13] [paper42]
[19-02-12] [paper41]
[19-02-07] [paper40]
[19-02-06] [paper39]
  • Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks [pdf] [pdf with comments] [comments]
  • José Miguel Hernández-Lobato, Ryan P. Adams
  • 2015-07-15, ICML2015
[19-02-05] [paper38]
[19-01-28] [paper37]
[19-01-27] [paper36]
[19-01-26] [paper35]
  • Learning Weight Uncertainty with Stochastic Gradient MCMC for Shape Classification [pdf] [poster] [pdf with comments] [comments]
  • Chunyuan Li, Andrew Stevens, Changyou Chen, Yunchen Pu, Zhe Gan, Lawrence Carin
  • CVPR2016
[19-01-25] [paper34]
[19-01-25] [paper33]
[19-01-24] [paper32]
  • Tutorial: Introduction to Stochastic Gradient Markov Chain Monte Carlo Methods [pdf] [pdf with comments]
  • Changyou Chen
  • 2016-08-10
[19-01-24] [paper31]
[19-01-23] [paper30]
[19-01-23] [paper29]
[19-01-17] [paper28]
[19-01-09] [paper27]
  • Relaxed Softmax: Efficient Confidence Auto-Calibration for Safe Pedestrian Detection [pdf] [poster] [pdf with comments] [summary]
  • Lukas Neumann, Andrew Zisserman, Andrea Vedaldi
  • 2018-11-29, NeurIPS2018 Workshop

Papers Read in 2018:

[18-12-12] [paper26]
[18-12-06] [paper25]
[18-12-05] [paper24]
[18-11-29] [paper23]
[18-11-22] [paper22]
  • A Probabilistic U-Net for Segmentation of Ambiguous Images [pdf] [code] [pdf with comments] [summary]
  • Simon A. A. Kohl, Bernardino Romera-Paredes, Clemens Meyer, Jeffrey De Fauw, Joseph R. Ledsam, Klaus H. Maier-Hein, S. M. Ali Eslami, Danilo Jimenez Rezende, Olaf Ronneberger
  • 2018-10-29, NeurIPS2018
[18-11-22] [paper21]
  • When Recurrent Models Don't Need To Be Recurrent (a.k.a. Stable Recurrent Models) [pdf] [pdf with comments] [summary]
  • John Miller, Moritz Hardt
  • 2018-05-29, ICLR2019
[18-11-16] [paper20]
  • Uncertainty Estimates and Multi-Hypotheses Networks for Optical Flow [pdf] [pdf with comments] [summary]
  • Eddy Ilg, Özgün Çiçek, Silvio Galesso, Aaron Klein, Osama Makansi, Frank Hutter, Thomas Brox
  • 2018-08-06, ECCV2018
[18-11-15] [paper19]
  • Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) [pdf] [pdf with comments] [summary]
  • Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, Rory Sayres
  • 2018-06-07, ICML2018
[18-11-12] [paper18]
  • Large-Scale Visual Active Learning with Deep Probabilistic Ensembles [pdf] [pdf with comments] [summary]
  • Kashyap Chitta, Jose M. Alvarez, Adam Lesnikowski
  • 2018-11-08
[18-11-08] [paper17]
[18-10-26] [paper16]
  • Towards Safe Autonomous Driving: Capture Uncertainty in the Deep Neural Network For Lidar 3D Vehicle Detection [pdf] [pdf with comments] [summary]
  • Di Feng, Lars Rosenbaum, Klaus Dietmayer
  • 2018-09-08, ITSC2018
[18-10-25] [paper15]
  • Bayesian Convolutional Neural Networks with Many Channels are Gaussian Processes [pdf] [pdf with comments] [summary]
  • Roman Novak, Lechao Xiao, Jaehoon Lee, Yasaman Bahri, Daniel A. Abolafia, Jeffrey Pennington, Jascha Sohl-Dickstein
  • 2018-10-11, ICLR2019
[18-10-19] [paper14]
  • Uncertainty in Neural Networks: Bayesian Ensembling [pdf] [pdf with comments] [summary]
  • Tim Pearce, Mohamed Zaki, Alexandra Brintrup, Andy Neel
  • 2018-10-12, AISTATS2019 submission
[18-10-18] [paper13]
  • Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles [pdf] [pdf with comments] [summary]
  • Balaji Lakshminarayanan, Alexander Pritzel, Charles Blundell
  • 2017-11-17, NeurIPS2017
[18-10-18] [paper12]
  • Reliable Uncertainty Estimates in Deep Neural Networks using Noise Contrastive Priors [pdf] [pdf with comments] [summary]
  • Danijar Hafner, Dustin Tran, Alex Irpan, Timothy Lillicrap, James Davidson
  • 2018-07-24, ICML2018 Workshop
[18-10-05] [paper11]
[18-10-04] [paper10]
[18-10-04] [paper9]
  • On gradient regularizers for MMD GANs [pdf] [pdf with comments] [summary]
  • Michael Arbel, Dougal J. Sutherland, Mikołaj Bińkowski, Arthur Gretton
  • 2018-05-29, NeurIPS2018
[18-09-30] [paper8]
  • Neural Processes [pdf] [pdf with comments] [summary]
  • Marta Garnelo, Jonathan Schwarz, Dan Rosenbaum, Fabio Viola, Danilo J. Rezende, S.M. Ali Eslami, Yee Whye Teh
  • 2018-07-04, ICML2018 Workshop
[18-09-27] [paper7]
  • Conditional Neural Processes [pdf] [pdf with comments] [summary]
  • Marta Garnelo, Dan Rosenbaum, Chris J. Maddison, Tiago Ramalho, David Saxton, Murray Shanahan, Yee Whye Teh, Danilo J. Rezende, S. M. Ali Eslami
  • 2018-07-04, ICML2018
[18-09-27] [paper6]
[18-09-25] [paper5]
  • Deep Confidence: A Computationally Efficient Framework for Calculating Reliable Errors for Deep Neural Networks [pdf] [pdf with comments] [summary]
  • Isidro Cortes-Ciriano, Andreas Bender
  • 2018-09-24
[18-09-25] [paper4]
  • Leveraging Heteroscedastic Aleatoric Uncertainties for Robust Real-Time LiDAR 3D Object Detection [pdf] [pdf with comments] [summary]
  • Di Feng, Lars Rosenbaum, Fabian Timm, Klaus Dietmayer
  • 2018-09-14
[18-09-24] [paper3]
[18-09-24] [paper2]
[18-09-20] [paper1]
  • Gaussian Process Behaviour in Wide Deep Neural Networks [pdf] [pdf with comments] [summary]
  • Alexander G. de G. Matthews, Mark Rowland, Jiri Hron, Richard E. Turner, Zoubin Ghahramani
  • 2018-08-16, ICLR2018



Uncertainty Estimation:

[22-03-02] [paper196]
Somewhat interesting paper. They use a softmax model with MC-dropout to compute uncertainty estimates. The evaluation is not very extensive, they mostly just check that the classification accuracy improves as they reject more and more samples based on a uncertainty threshold.
[22-02-24] [paper191]
Interesting and well-written paper. I wasn't very familiar with CT image reconstruction, but they do a good job explaining everything. Interesting that MC-dropout seems important for getting well-calibrated predictions.
[22-02-21] [paper190]
Quite interesting and well-written paper. They compare MC-dropout, ensemlbing and mixup (and with a standard softmax classifer as the baseline). Nothing groundbreaking, but the studied application (classification of pathology slides for cancer) is very interesting. The FPR95 metrics for OOD detection in Table 4 are terrible for ensembling, but the classification accuracy (89.7) is also pretty much the same as for D_test_int in Tabe 3 (90.1)? So, it doesn't really matter that the model isn't capable of distinguishing this "OOD" data from in-distribution? 
[22-02-21] [paper189]
Somewhat interesting paper. I didn't quite understand everything, so it could be more interesting than I think. The fact that their pseudo-input generation process "relies on the availability of a differentiable density estimate of the data" seems like a big limitation? For regression, they only applied their method to very low-dimensional input data (1D toy regression and UCI benchmarks), but would this work for image-based tasks?
[22-02-19] [paper186]
Quite interesting and well-written paper. Only experiments on a toy 1D regression problem, and flight delay prediction in which the input is 8D. The approach of just adding noise to the input x to get OOD samples would probably not work very well e.g. for image-based problems?
[22-02-16] [paper184]
Interesting and well-written paper. The proposed method makes intuitive sense, trying to incorporate the "OOD training" method (i.e., to use some kind of OOD data during training, similar to e.g. the "Deep Anomaly Detection with Outlier Exposure" paper) into the Bayesian deep learning approach. The experimental results do seem quite promising.
[22-02-15] [paper183]
Well-written and interesting paper. Short paper of just 3 pages, but with an extensive appendix which I definitely recommend going through. The method, training an ensemble and then applying the Laplace approximation to each network, is very simple and intuitively makes a lot of sense. I didn't realize that this would have basically the same test-time speed as ensembling (since they utilize that probit approximation), that's very neat. It also seems to consistently outperform ensembling a bit across almost all tasks and metrics.
[22-02-15] [paper182]
Well-written and somewhat interesting paper. I'm not overly familiar with RL, which makes it a bit difficult for me to properly evaluate the paper's contributions. They use standard ensembles for uncertainty estimation combined with an OOD sampling regularization. I thought that the OOD sampling could be interesting, but it seems very specific to RL. I'm sure this paper is quite interesting for people doing RL, but I don't think it's overly useful for me.
[22-02-15] [paper181]
Quite interesting and very well-written paper, I enjoyed reading it. Their analysis of fitting Gaussian regression models via the NLL is quite interesting, I didn't really expect to learn something new about this. I've seen Gaussian models outperform standard regression (L2 loss) w.r.t. accuracy in some applications/datasets, and it being the other way around in others. In the first case, I've then attributed the success of the Gaussian model to the "learned loss attenuation". The analysis in this paper could perhaps explain why you get this performance boost only in certain applications. Their beta-NLL loss could probably be quite useful, seems like a convenient tool to have.
[22-02-15] [paper180]
Well-written and somewhat interesting paper. I'm not overly familiar with reinforcement learning, which makes it a bit difficult for me to properly evaluate the paper's contributions, but to me it seems like fairly straightforward method modifications? To use ensembles of Gaussian models (instead of ensembles of models trained using the L2 loss) makes sense. The BIV method I didn't quite get, it seems rather ad hoc? I also don't quite get exactly how it's used in equation (10), is the ensemble of Gaussian models trained _jointly_ using this loss? I don't really know if this could be useful outside of RL.
[22-02-14] [paper179]
Interesting and very well-written paper, I enjoyed reading it. I still think that ensembling probably is quite difficult to beat purely in terms of uncertainty estimation quality, but this definitely seems like a useful tool in many situations. It's not clear to me if the analytical expression for regression in "4. Approximate Predictive Distribution" is applicable also if the variance is input-dependent?
[22-02-12] [paper178]
Well-written and interesting paper. They synthetically create dataset shifts (e.g. by adding Gaussian noise to the data) of increasing intensity and study whether or not the uncertainty increases as the accuracy degrades. They compare regular softmax, temperature scaling, MC-dropout, ensembling and a simple variational inference method. Their conclusion is basically that ensembling slightly outperforms the other methods, but that no method performs overly well. I think these type of studies are really useful.
[22-02-12] [paper177]
Well-written and interesting paper. This is a good paper to read before "Natural Posterior Network: Deep Bayesian Predictive Uncertainty for Exponential Family Distributions". Their proposed method seems to have similar / slightly worse performance than a small ensemble, so the only real advantage is that it's faster at time-time? This is of course very important in many applications, but not in all. The performance also seems quite sensitive to the choice of lambda in the combined loss function (Equation (10)), according to Figure S2 in the appendix?
[22-02-10] [paper175]
Interesting and well-written paper. I didn't quite understand all the details, I'll have to read a couple of related/background papers to be able to properly appreciate and evaluate the proposed method. I definitely feel like I would like to read up on this family of methods. Extensive experimental evaluation, and the results seem promising overall.
[21-12-09] [paper171]
Quite interesting and well-written paper. Quite a heavy read, probably need to be rather familiar with GPs to properly understand/appreciate everything. Definitely check Appendix D, it gives a better understanding of how the proposed method is applied in practice. I'm not quite sure how strong/impressive the experimental results actually are. Also seems like the method could be a bit inconvenient to implement/use?
[21-10-28] [paper165]
  • Deep Classifiers with Label Noise Modeling and Distance Awareness [pdf] [annotated pdf]
  • Vincent Fortuin, Mark Collier, Florian Wenzel, James Allingham, Jeremiah Liu, Dustin Tran, Balaji Lakshminarayanan, Jesse Berent, Rodolphe Jenatton, Effrosyni Kokiopoulou
  • 2021-10-06
  • [Uncertainty Estimation]
Quite interesting and well-written paper. I find the distance-awareness property more interesting than modelling of input/class-dependent label noise, so the proposed method (HetSNGP) is perhaps not overly interesting compared to the SNGP baseline.
[21-10-06] [paper157]
Well-written and interesting paper. Quite easy to read and follow, the method is clearly explained and makes intuitive sense.
[21-09-21] [paper148]
Very well-written and quite interesting paper, I enjoyed reading it. Everything is quite well-explained, it's relatively easy to follow. The paper provides a good overview of the out-of-distribution detection problem and current methods.
[21-09-08] [paper145]
  • Revisiting the Calibration of Modern Neural Networks [pdf] [code] [annotated pdf]
  • Matthias Minderer, Josip Djolonga, Rob Romijnders, Frances Hubis, Xiaohua Zhai, Neil Houlsby, Dustin Tran, Mario Lucic
  • 2021-06-15, NeurIPS 2021
  • [Uncertainty Estimation]
Well-written paper. Everything is quite clearly explained and easy to understand. Quite enjoyable to read overall. 

Thorough experimental evaluation. Quite interesting findings.
[21-04-01] [paper126]
[21-03-04] [paper122]
[21-01-22] [paper116]
[20-09-24] [paper106]
[20-09-21] [paper105]
[20-06-05] [paper100]
[20-05-27] [paper99]
[20-04-17] [paper97]
[20-04-09] [paper96]
[20-03-27] [paper94]
[20-03-26] [paper93]
[20-02-28] [paper91]
[20-02-13] [paper86]
[20-02-08] [paper85]
[20-01-31] [paper84]
[20-01-08] [paper76]
[19-06-12] [paper56]
[19-06-05] [paper55]
  • LaserNet: An Efficient Probabilistic 3D Object Detector for Autonomous Driving [pdf] [pdf with comments] [comments]
  • Gregory P. Meyer, Ankit Laddha, Eric Kee, Carlos Vallespi-Gonzalez, Carl K. Wellington
  • 2019-03-20, CVPR2019
[19-04-05] [paper53]
  • Stochastic Gradient Descent as Approximate Bayesian Inference [pdf] [pdf with comments] [comments]
  • Stephan Mandt, Matthew D. Hoffman, David M. Blei
  • 2017-04-13, Journal of Machine Learning Research 18 (2017)
[19-02-27] [paper47]
[19-02-25] [paper46]
  • Evaluating model calibration in classification [pdf] [code] [pdf with comments] [comments]
  • Juozas Vaicenavicius, David Widmann, Carl Andersson, Fredrik Lindsten, Jacob Roll, Thomas B. Schön
  • 2019-02-19, AISTATS2019
[19-02-14] [paper43]
[19-02-13] [paper42]
[19-02-12] [paper41]
[19-02-07] [paper40]
[19-02-06] [paper39]
  • Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks [pdf] [pdf with comments] [comments]
  • José Miguel Hernández-Lobato, Ryan P. Adams
  • 2015-07-15, ICML2015
[19-02-05] [paper38]
[19-01-28] [paper37]
[19-01-27] [paper36]
[19-01-26] [paper35]
  • Learning Weight Uncertainty with Stochastic Gradient MCMC for Shape Classification [pdf] [poster] [pdf with comments] [comments]
  • Chunyuan Li, Andrew Stevens, Changyou Chen, Yunchen Pu, Zhe Gan, Lawrence Carin
  • CVPR2016
[19-01-25] [paper34]
[19-01-25] [paper33]
[19-01-24] [paper32]
  • Tutorial: Introduction to Stochastic Gradient Markov Chain Monte Carlo Methods [pdf] [pdf with comments]
  • Changyou Chen
  • 2016-08-10
[19-01-23] [paper30]
[19-01-23] [paper29]
[19-01-09] [paper27]
  • Relaxed Softmax: Efficient Confidence Auto-Calibration for Safe Pedestrian Detection [pdf] [poster] [pdf with comments] [summary]
  • Lukas Neumann, Andrew Zisserman, Andrea Vedaldi
  • 2018-11-29, NeurIPS2018 Workshop
[18-12-06] [paper25]
[18-12-05] [paper24]
[18-11-29] [paper23]
[18-11-22] [paper22]
  • A Probabilistic U-Net for Segmentation of Ambiguous Images [pdf] [code] [pdf with comments] [summary]
  • Simon A. A. Kohl, Bernardino Romera-Paredes, Clemens Meyer, Jeffrey De Fauw, Joseph R. Ledsam, Klaus H. Maier-Hein, S. M. Ali Eslami, Danilo Jimenez Rezende, Olaf Ronneberger
  • 2018-10-29, NeurIPS2018
[18-11-16] [paper20]
  • Uncertainty Estimates and Multi-Hypotheses Networks for Optical Flow [pdf] [pdf with comments] [summary]
  • Eddy Ilg, Özgün Çiçek, Silvio Galesso, Aaron Klein, Osama Makansi, Frank Hutter, Thomas Brox
  • 2018-08-06, ECCV2018
[18-11-12] [paper18]
  • Large-Scale Visual Active Learning with Deep Probabilistic Ensembles [pdf] [pdf with comments] [summary]
  • Kashyap Chitta, Jose M. Alvarez, Adam Lesnikowski
  • 2018-11-08
[18-10-26] [paper16]
  • Towards Safe Autonomous Driving: Capture Uncertainty in the Deep Neural Network For Lidar 3D Vehicle Detection [pdf] [pdf with comments] [summary]
  • Di Feng, Lars Rosenbaum, Klaus Dietmayer
  • 2018-09-08, ITSC2018
[18-10-19] [paper14]
  • Uncertainty in Neural Networks: Bayesian Ensembling [pdf] [pdf with comments] [summary]
  • Tim Pearce, Mohamed Zaki, Alexandra Brintrup, Andy Neel
  • 2018-10-12, AISTATS2019 submission
[18-10-18] [paper13]
  • Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles [pdf] [pdf with comments] [summary]
  • Balaji Lakshminarayanan, Alexander Pritzel, Charles Blundell
  • 2017-11-17, NeurIPS2017
[18-10-18] [paper12]
  • Reliable Uncertainty Estimates in Deep Neural Networks using Noise Contrastive Priors [pdf] [pdf with comments] [summary]
  • Danijar Hafner, Dustin Tran, Alex Irpan, Timothy Lillicrap, James Davidson
  • 2018-07-24, ICML2018 Workshop
[18-09-25] [paper5]
  • Deep Confidence: A Computationally Efficient Framework for Calculating Reliable Errors for Deep Neural Networks [pdf] [pdf with comments] [summary]
  • Isidro Cortes-Ciriano, Andreas Bender
  • 2018-09-24
[18-09-25] [paper4]
  • Leveraging Heteroscedastic Aleatoric Uncertainties for Robust Real-Time LiDAR 3D Object Detection [pdf] [pdf with comments] [summary]
  • Di Feng, Lars Rosenbaum, Fabian Timm, Klaus Dietmayer
  • 2018-09-14
[18-09-24] [paper3]
[18-09-24] [paper2]



Out-of-Distribution Detection:

[22-02-26] [paper195]
Quite interesting and well-written paper. It seemed quite niche at first, but I think their analysis could potentially be useful.
[22-02-26] [paper194]
Quite interesting and well-written paper. Two simple modifications of the "maximum softmax score" baseline, and the performance is consistently improved. The input perturbation method is quite interesting. Intuitively, it's not entirely clear to me why it actually works.
[22-02-25] [paper193]
Interesting and well-written paper. Interesting that Mahalanobis works very well on the CIFAR10 vs SVHN but not on the medical imaging dataset. I don't quite get how/why the ODIN method works, I'll probably have to read that paper.
[22-02-21] [paper190]
Quite interesting and well-written paper. They compare MC-dropout, ensemlbing and mixup (and with a standard softmax classifer as the baseline). Nothing groundbreaking, but the studied application (classification of pathology slides for cancer) is very interesting. The FPR95 metrics for OOD detection in Table 4 are terrible for ensembling, but the classification accuracy (89.7) is also pretty much the same as for D_test_int in Tabe 3 (90.1)? So, it doesn't really matter that the model isn't capable of distinguishing this "OOD" data from in-distribution? 
[22-02-19] [paper188]
  • Contrastive Training for Improved Out-of-Distribution Detection [pdf] [annotated pdf]
  • Jim Winkens, Rudy Bunel, Abhijit Guha Roy, Robert Stanforth, Vivek Natarajan, Joseph R. Ledsam, Patricia MacWilliams, Pushmeet Kohli, Alan Karthikesalingam, Simon Kohl, Taylan Cemgil, S. M. Ali Eslami, Olaf Ronneberger
  • 2020-07-10
  • [Out-of-Distribution Detection]
Quite interesting and very well-written paper. They take the method from the Mahalanobis paper ("A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks") (however, they fit Gaussians only to the features at the second-to-last network layer, and they don't use the input pre-processing either) and consistently improve OOD detection performance by incorporating contrastive training. Specifically, they first train the network using just the SimCLR loss for a large number of epochs, and then also add the standard classification loss. I didn't quite get why the label smoothing is necessary, but according to Table 2 it's responsible for a large portion of the performance gain.
[22-02-19] [paper187]
Well-written and interesting paper. The proposed method is simple and really neat: fit class-conditional Gaussians in the feature space of a pre-trained classifier (basically just LDA on the feature vectors), and then use the Mahalanobis distance to these Gaussians as the confidence score for input x. They then also do this for the features at multiple levels of the network and combine these confidence scores into one. I don't quite get why the "input pre-processing" in Section 2.2 (adding noise to test samples) works, in Table 1 it significantly improves the performance.
[22-02-19] [paper186]
Quite interesting and well-written paper. Only experiments on a toy 1D regression problem, and flight delay prediction in which the input is 8D. The approach of just adding noise to the input x to get OOD samples would probably not work very well e.g. for image-based problems?
[22-02-18] [paper185]
  • Does Your Dermatology Classifier Know What It Doesn't Know? Detecting the Long-Tail of Unseen Conditions [pdf] [annotated pdf]
  • Abhijit Guha Roy, Jie Ren, Shekoofeh Azizi, Aaron Loh, Vivek Natarajan, Basil Mustafa, Nick Pawlowski, Jan Freyberg, Yuan Liu, Zach Beaver, Nam Vo, Peggy Bui, Samantha Winter, Patricia MacWilliams, Greg S. Corrado, Umesh Telang, Yun Liu, Taylan Cemgil, Alan Karthikesalingam, Balaji Lakshminarayanan, Jim Winkens
  • 2021-04-08, Medical Image Analysis (January 2022)
  • [Out-of-Distribution Detection] [Medical ML]
Well-written and interesting paper. Quite long, so it took a bit longer than usual to read it. Section 1 and 2 gives a great overview of OOD detection in general, and how it can be used specifically in this dermatology setting. I can definitely recommend reading Section 2 (Related work). They assume access to some outlier data during training, so their approach is similar to the "Outlier exposure" method (specifically in this dermatology setting, they say that this is a fair assumption). Their method is an improvement of the "reject bucket" (add an extra class which you assign to all outlier training data points), in their proposed method they also use fine-grained classification of the outlier skin conditions. Then they also use an ensemble of 5 models, and also a more diverse ensemble (in which they combine models trained with different representation learning techniques). This diverse ensemble obtains the best performance.
[22-02-16] [paper184]
Interesting and well-written paper. The proposed method makes intuitive sense, trying to incorporate the "OOD training" method (i.e., to use some kind of OOD data during training, similar to e.g. the "Deep Anomaly Detection with Outlier Exposure" paper) into the Bayesian deep learning approach. The experimental results do seem quite promising.
[22-02-15] [paper183]
Well-written and interesting paper. Short paper of just 3 pages, but with an extensive appendix which I definitely recommend going through. The method, training an ensemble and then applying the Laplace approximation to each network, is very simple and intuitively makes a lot of sense. I didn't realize that this would have basically the same test-time speed as ensembling (since they utilize that probit approximation), that's very neat. It also seems to consistently outperform ensembling a bit across almost all tasks and metrics.
[22-02-12] [paper178]
Well-written and interesting paper. They synthetically create dataset shifts (e.g. by adding Gaussian noise to the data) of increasing intensity and study whether or not the uncertainty increases as the accuracy degrades. They compare regular softmax, temperature scaling, MC-dropout, ensembling and a simple variational inference method. Their conclusion is basically that ensembling slightly outperforms the other methods, but that no method performs overly well. I think these type of studies are really useful.
[22-02-12] [paper177]
Well-written and interesting paper. This is a good paper to read before "Natural Posterior Network: Deep Bayesian Predictive Uncertainty for Exponential Family Distributions". Their proposed method seems to have similar / slightly worse performance than a small ensemble, so the only real advantage is that it's faster at time-time? This is of course very important in many applications, but not in all. The performance also seems quite sensitive to the choice of lambda in the combined loss function (Equation (10)), according to Figure S2 in the appendix?
[22-02-11] [paper176]
Well-written and quite interesting paper. A short paper, just 4 pages. They don't study the method from the "Energy-based Out-of-distribution Detection" paper as I had expected, but it was still a quite interesting read. The results in Section 4.2 seem interesting, especially for experiment 3, but I'm not sure that I properly understand everything.
[22-02-10] [paper175]
Interesting and well-written paper. I didn't quite understand all the details, I'll have to read a couple of related/background papers to be able to properly appreciate and evaluate the proposed method. I definitely feel like I would like to read up on this family of methods. Extensive experimental evaluation, and the results seem promising overall.
[22-02-09] [paper174]
Interesting and well-written paper. The proposed method is quite clearly explained and makes intuitive sense (at least if you're familiar with EBMs). Compared to using the softmax score, the performance does seem to improve consistently. Seems like fine-tuning on an "auxiliary outlier dataset" is required to get really good performance though, which you can't really assume to have access to in real-world problems, I suppose?
[22-02-09] [paper173]
Interesting and quite well-written paper. I did find it somewhat difficult to understand certain parts though, they could perhaps be explained more clearly. The results seem quite impressive (they do consistently outperform all baselines), but I find it interesting that the "Gaussian noise" baseline in Table 2 performs that well? I should probably have read "Energy-based Out-of-distribution Detection" before reading this paper.
[21-12-09] [paper171]
Quite interesting and well-written paper. Quite a heavy read, probably need to be rather familiar with GPs to properly understand/appreciate everything. Definitely check Appendix D, it gives a better understanding of how the proposed method is applied in practice. I'm not quite sure how strong/impressive the experimental results actually are. Also seems like the method could be a bit inconvenient to implement/use?
[21-12-03] [paper170]
Interesting and very well-written paper. Gives a good overview of the field and contains a lot of seemingly useful references. The evaluation is very comprehensive. The user study is quite neat.
[21-11-25] [paper168]
Quite interesting and well-written paper. The experimental results do seem promising. However, I don't quite get why the proposed method intuitively makes sense, why is it better to only use the parameters of the final network layer?
[21-09-21] [paper148]
Very well-written and quite interesting paper, I enjoyed reading it. Everything is quite well-explained, it's relatively easy to follow. The paper provides a good overview of the out-of-distribution detection problem and current methods.



Theoretical Properties of Deep Learning:

[22-02-25] [paper192]
Quite interesting and well-written paper. The definition of "prediction depth" in Section 2.1 makes sense, and it definitely seems reasonable that this could correlate with example difficulty / prediction confidence in some way. Section 3 and 4, and all the figures, contain a lot of info it seems, I'd probably need to read the paper again to properly understand/appreciate everything.
[21-12-02] [paper169]
Quite well-written paper overall that seemed interesting, but I found it very difficult to properly understand everything. Thus, I can't really tell how interesting/significant their analysis actually is.
[21-11-11] [paper166]
Quite well-written and somewhat interesting paper. I'm not very familiar with this area. I didn't spend too much time trying to properly evaluate the significance of the findings.
[21-10-21] [paper164]
Somewhat interesting paper. The phenomena observed in Figure 1, that validation accuracy suddenly increases long after almost perfect fitting of the training data has been achieved is quite interesting. I didn't quite understand the datasets they use (binary operation tables).
[21-03-19] [paper124]
[21-02-12] [paper119]
[20-11-06] [paper109]
[20-10-16] [paper108]
[20-03-09] [paper92]
[20-01-24] [paper83]
[20-01-17] [paper81]
[19-02-22] [paper45]
  • Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks [pdf] [pdf with comments] [comments]
  • Sanjeev Arora, Simon S. Du, Wei Hu, Zhiyuan Li, Ruosong Wang
  • 2019-01-24
[19-02-17] [paper44]
[19-01-17] [paper28]
[18-11-08] [paper17]
[18-10-25] [paper15]
  • Bayesian Convolutional Neural Networks with Many Channels are Gaussian Processes [pdf] [pdf with comments] [summary]
  • Roman Novak, Lechao Xiao, Jaehoon Lee, Yasaman Bahri, Daniel A. Abolafia, Jeffrey Pennington, Jascha Sohl-Dickstein
  • 2018-10-11, ICLR2019
[18-09-20] [paper1]
  • Gaussian Process Behaviour in Wide Deep Neural Networks [pdf] [pdf with comments] [summary]
  • Alexander G. de G. Matthews, Mark Rowland, Jiri Hron, Richard E. Turner, Zoubin Ghahramani
  • 2018-08-16, ICLR2018



VAEs:

[21-09-21] [paper148]
Very well-written and quite interesting paper, I enjoyed reading it. Everything is quite well-explained, it's relatively easy to follow. The paper provides a good overview of the out-of-distribution detection problem and current methods.
[20-11-23] [paper111]
[20-11-13] [paper110]
[20-06-18] [paper102]
[20-02-14] [paper87]
[20-01-10] [paper77]
[19-11-26] [paper66]
[19-03-11] [paper50]
[19-03-04] [paper49]



Normalizing Flows:

[20-04-03] [paper95]
[19-12-20] [paper72]
[19-11-26] [paper66]
[19-10-18] [paper63]
  • Improving Variational Inference with Inverse Autoregressive Flow [pdf] [code] [pdf with comments] [comments]
  • Diederik P. Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, Max Welling
  • 2016-06-15, NeurIPS2016
[19-10-11] [paper62]
[18-09-27] [paper6]



Autonomous Driving:

[19-07-11] [paper60]
  • Part-A^2 Net: 3D Part-Aware and Aggregation Neural Network for Object Detection from Point Cloud [pdf] [pdf with comments] [comments]
  • Shaoshuai Shi, Zhe Wang, Xiaogang Wang, Hongsheng Li
  • 2019-07-08
[19-07-10] [paper59]
[19-07-03] [paper58]
[19-06-05] [paper55]
  • LaserNet: An Efficient Probabilistic 3D Object Detector for Autonomous Driving [pdf] [pdf with comments] [comments]
  • Gregory P. Meyer, Ankit Laddha, Eric Kee, Carlos Vallespi-Gonzalez, Carl K. Wellington
  • 2019-03-20, CVPR2019
[19-01-09] [paper27]
  • Relaxed Softmax: Efficient Confidence Auto-Calibration for Safe Pedestrian Detection [pdf] [poster] [pdf with comments] [summary]
  • Lukas Neumann, Andrew Zisserman, Andrea Vedaldi
  • 2018-11-29, NeurIPS2018 Workshop
[18-12-06] [paper25]
[18-11-22] [paper22]
  • A Probabilistic U-Net for Segmentation of Ambiguous Images [pdf] [code] [pdf with comments] [summary]
  • Simon A. A. Kohl, Bernardino Romera-Paredes, Clemens Meyer, Jeffrey De Fauw, Joseph R. Ledsam, Klaus H. Maier-Hein, S. M. Ali Eslami, Danilo Jimenez Rezende, Olaf Ronneberger
  • 2018-10-29, NeurIPS2018
[18-11-16] [paper20]
  • Uncertainty Estimates and Multi-Hypotheses Networks for Optical Flow [pdf] [pdf with comments] [summary]
  • Eddy Ilg, Özgün Çiçek, Silvio Galesso, Aaron Klein, Osama Makansi, Frank Hutter, Thomas Brox
  • 2018-08-06, ECCV2018
[18-10-26] [paper16]
  • Towards Safe Autonomous Driving: Capture Uncertainty in the Deep Neural Network For Lidar 3D Vehicle Detection [pdf] [pdf with comments] [summary]
  • Di Feng, Lars Rosenbaum, Klaus Dietmayer
  • 2018-09-08, ITSC2018
[18-10-05] [paper11]
[18-10-04] [paper10]
[18-09-25] [paper4]
  • Leveraging Heteroscedastic Aleatoric Uncertainties for Robust Real-Time LiDAR 3D Object Detection [pdf] [pdf with comments] [summary]
  • Di Feng, Lars Rosenbaum, Fabian Timm, Klaus Dietmayer
  • 2018-09-14
[18-09-24] [paper2]



Medical ML:

[22-03-02] [paper196]
Somewhat interesting paper. They use a softmax model with MC-dropout to compute uncertainty estimates. The evaluation is not very extensive, they mostly just check that the classification accuracy improves as they reject more and more samples based on a uncertainty threshold.
[22-02-25] [paper193]
Interesting and well-written paper. Interesting that Mahalanobis works very well on the CIFAR10 vs SVHN but not on the medical imaging dataset. I don't quite get how/why the ODIN method works, I'll probably have to read that paper.
[22-02-24] [paper191]
Interesting and well-written paper. I wasn't very familiar with CT image reconstruction, but they do a good job explaining everything. Interesting that MC-dropout seems important for getting well-calibrated predictions.
[22-02-21] [paper190]
Quite interesting and well-written paper. They compare MC-dropout, ensemlbing and mixup (and with a standard softmax classifer as the baseline). Nothing groundbreaking, but the studied application (classification of pathology slides for cancer) is very interesting. The FPR95 metrics for OOD detection in Table 4 are terrible for ensembling, but the classification accuracy (89.7) is also pretty much the same as for D_test_int in Tabe 3 (90.1)? So, it doesn't really matter that the model isn't capable of distinguishing this "OOD" data from in-distribution? 
[22-02-18] [paper185]
  • Does Your Dermatology Classifier Know What It Doesn't Know? Detecting the Long-Tail of Unseen Conditions [pdf] [annotated pdf]
  • Abhijit Guha Roy, Jie Ren, Shekoofeh Azizi, Aaron Loh, Vivek Natarajan, Basil Mustafa, Nick Pawlowski, Jan Freyberg, Yuan Liu, Zach Beaver, Nam Vo, Peggy Bui, Samantha Winter, Patricia MacWilliams, Greg S. Corrado, Umesh Telang, Yun Liu, Taylan Cemgil, Alan Karthikesalingam, Balaji Lakshminarayanan, Jim Winkens
  • 2021-04-08, Medical Image Analysis (January 2022)
  • [Out-of-Distribution Detection] [Medical ML]
Well-written and interesting paper. Quite long, so it took a bit longer than usual to read it. Section 1 and 2 gives a great overview of OOD detection in general, and how it can be used specifically in this dermatology setting. I can definitely recommend reading Section 2 (Related work). They assume access to some outlier data during training, so their approach is similar to the "Outlier exposure" method (specifically in this dermatology setting, they say that this is a fair assumption). Their method is an improvement of the "reject bucket" (add an extra class which you assign to all outlier training data points), in their proposed method they also use fine-grained classification of the outlier skin conditions. Then they also use an ensemble of 5 models, and also a more diverse ensemble (in which they combine models trained with different representation learning techniques). This diverse ensemble obtains the best performance.
[22-02-12] [paper178]
Well-written and interesting paper. They synthetically create dataset shifts (e.g. by adding Gaussian noise to the data) of increasing intensity and study whether or not the uncertainty increases as the accuracy degrades. They compare regular softmax, temperature scaling, MC-dropout, ensembling and a simple variational inference method. Their conclusion is basically that ensembling slightly outperforms the other methods, but that no method performs overly well. I think these type of studies are really useful.
[21-12-03] [paper170]
Interesting and very well-written paper. Gives a good overview of the field and contains a lot of seemingly useful references. The evaluation is very comprehensive. The user study is quite neat.
[18-11-22] [paper22]
  • A Probabilistic U-Net for Segmentation of Ambiguous Images [pdf] [code] [pdf with comments] [summary]
  • Simon A. A. Kohl, Bernardino Romera-Paredes, Clemens Meyer, Jeffrey De Fauw, Joseph R. Ledsam, Klaus H. Maier-Hein, S. M. Ali Eslami, Danilo Jimenez Rezende, Olaf Ronneberger
  • 2018-10-29, NeurIPS2018



Object Detection:

[20-06-12] [paper101]
[19-07-03] [paper58]
[19-06-12] [paper56]



3D Object Detection:

[19-07-11] [paper60]
  • Part-A^2 Net: 3D Part-Aware and Aggregation Neural Network for Object Detection from Point Cloud [pdf] [pdf with comments] [comments]
  • Shaoshuai Shi, Zhe Wang, Xiaogang Wang, Hongsheng Li
  • 2019-07-08
[19-07-10] [paper59]
[19-07-03] [paper58]
[19-06-05] [paper55]
  • LaserNet: An Efficient Probabilistic 3D Object Detector for Autonomous Driving [pdf] [pdf with comments] [comments]
  • Gregory P. Meyer, Ankit Laddha, Eric Kee, Carlos Vallespi-Gonzalez, Carl K. Wellington
  • 2019-03-20, CVPR2019
[18-10-26] [paper16]
  • Towards Safe Autonomous Driving: Capture Uncertainty in the Deep Neural Network For Lidar 3D Vehicle Detection [pdf] [pdf with comments] [summary]
  • Di Feng, Lars Rosenbaum, Klaus Dietmayer
  • 2018-09-08, ITSC2018
[18-10-05] [paper11]
[18-10-04] [paper10]
[18-09-25] [paper4]
  • Leveraging Heteroscedastic Aleatoric Uncertainties for Robust Real-Time LiDAR 3D Object Detection [pdf] [pdf with comments] [summary]
  • Di Feng, Lars Rosenbaum, Fabian Timm, Klaus Dietmayer
  • 2018-09-14



3D Multi-Object Tracking:

[20-02-18] [paper89]
[20-02-15] [paper88]



3D Human Pose Estimation:

[21-10-08] [paper161]
Well-written and quite interesting paper. I read it mainly as background for "Hierarchical Kinematic Probability Distributions for 3D Human Shape and Pose Estimation from Images in the Wild" which is written by exactly the same authors. In this paper, they predict a single Gaussian distribution for the pose (instead of hierarchical matrix-Fisher distributions). Also, they mainly focus on the body shape. They also use silhouettes + 2D keypoint heatmaps as input (instead of edge-filters + 2D keypoint heatmaps).
[21-10-08] [paper160]
Well-written and farily interesting paper. I read it mainly as background for "Hierarchical Kinematic Probability Distributions for 3D Human Shape and Pose Estimation from Images in the Wild" which is written by exactly the same authors. In this paper, they just use direct regression. They also use silhouettes + 2D keypoint heatmaps as input (instead of edge-filters + 2D keypoint heatmaps).
[21-10-07] [paper159]
Well-written and quite interesting paper. I didn't fully understand everything though, and it feels like I probably don't know this specific setting/problem well enough to fully appreciate the paper. 
[21-10-07] [paper158]
Well-written and very interesting paper, I enjoyed reading it. The hierarchical distribution prediction approach makes sense and consistently outperforms the independent baseline. Using matrix-Fisher distributions makes sense. The synthetic training framework and the input representation of edge-filters + 2D keypoint heatmaps are both interesting.
[21-10-04] [paper156]
Well-written and fairly interesting paper. The marker-based representation, instead of using skeleton joints, makes sense. The recursive projection scheme also makes sense, but seems very slow (2.27 sec/frame)? I didn't quite get all the details for their DCT representation of the latent space.
[21-10-03] [paper155]
Interesting and very well-written paper, I really enjoyed reading it. Interesting combination of implicit representations and 3D human modelling. The "inclusive human modelling" application is neat and important.
[21-10-02] [paper153]
Well-written and quite interesting paper. The main idea, using a learned conditional prior p(z|c) instead of just p(z), makes sense and was shown beneficial also in "HuMoR: 3D Human Motion Model for Robust Pose Estimation". I'm however somewhat confused by their specific implementation in Section 4, doesn't seem like a standard cVAE implementation?
[21-09-24] [paper150]
Well-written and fairly interesting paper. Quite a lot of details on the attention architecture, which I personally don't find overly interesting. The experimental results are quite impressive, but I would like to see a comparison in terms of computational cost at test-time. It sounds like their method is rather slow.
[21-09-23] [paper149]
Well-written and quite interesting paper. The general idea, refining frame-by-frame pose estimates via physical constraints, intuitively makes a lot of sense. I did however find it quite difficult to understand all the details in Section 3.
[21-09-17] [paper147]
Quite interesting paper, but also quite strange/confusing. I don't think the proposed method is explained particularly well, at least I found it quite difficult to properly understand what they actually are doing.

In the end it seems like they are learning a global loss function that is very similar to doing probabilistic regression with a Gauss/Laplace model of p(y|x) (with learned mean and variance)? See Figure 4 in the Appendix.

And while it's true that their performance is much better than for direct regression with an L2/L1 loss (see e.g. Table 1), they only compare with Gauss/Laplace probabilistic regression once (Table 7) and in that case the Laplace model is actually quite competitive?
[21-09-02] [paper143]
[21-06-19] [paper141]
[21-06-19] [paper140]
  • Expressive Body Capture: 3D Hands, Face, and Body from a Single Image [pdf] [code] [annotated pdf]
  • Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed A. A. Osman, Dimitrios Tzionas, Michael J. Black
  • 2019-04-11, CVPR 2019
  • [3D Human Pose Estimation]
Very well-written and quite interesting paper. Gives a good understanding of the SMPL model and the SMPLify method.
[21-06-18] [paper139]
  • Keep it SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image [pdf] [annotated pdf]
  • Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter Gehler, Javier Romero, Michael J. Black
  • 2016-07-27, ECCV 2016
  • [3D Human Pose Estimation]
[21-06-18] [paper138]
[21-06-17] [paper137]
[21-06-17] [paper136]
[21-06-16] [paper135]
[21-06-16] [paper134]
[21-06-15] [paper133]
[21-06-14] [paper132]
  • 3D Multi-bodies: Fitting Sets of Plausible 3D Human Models to Ambiguous Image Data [pdf] [annotated pdf]
  • Benjamin Biggs, Sébastien Ehrhadt, Hanbyul Joo, Benjamin Graham, Andrea Vedaldi, David Novotny
  • 2020-11-02, NeurIPS 2020
  • [3D Human Pose Estimation]
[21-06-04] [paper131]



Visual Tracking:

[19-06-12] [paper57]



Sequence Modeling:

[21-12-16] [paper172]
Very interesting and quite well-written paper. Kind of neat/fun to see state-space models being used. The experimental results seem very impressive!? I didn't fully understand everything in Section 3. I had to read Section 3.4 a couple of times to understand how the parameterization actually works in practice (you have H state-space models, one for each feature dimension, so that you can map a sequence of feature vectors to another sequence of feature vectors) (and you can then also have multiple such layers of state-space models, mapping sequence --> sequence --> sequence --> ....).
[20-01-17] [paper81]
[20-01-10] [paper77]
[19-11-26] [paper66]
[19-10-04] [paper61]
[19-05-29] [paper54]
  • Attention Is All You Need [pdf] [pdf with comments] [comments]
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin
  • 2017-06-12, NeurIPS2017
[19-03-15] [paper51]
[19-01-24] [paper31]
[18-11-22] [paper21]
  • When Recurrent Models Don't Need To Be Recurrent (a.k.a. Stable Recurrent Models) [pdf] [pdf with comments] [summary]
  • John Miller, Moritz Hardt
  • 2018-05-29, ICLR2019



Reinforcement Learning:

[22-02-15] [paper182]
Well-written and somewhat interesting paper. I'm not overly familiar with RL, which makes it a bit difficult for me to properly evaluate the paper's contributions. They use standard ensembles for uncertainty estimation combined with an OOD sampling regularization. I thought that the OOD sampling could be interesting, but it seems very specific to RL. I'm sure this paper is quite interesting for people doing RL, but I don't think it's overly useful for me.
[22-02-15] [paper180]
Well-written and somewhat interesting paper. I'm not overly familiar with reinforcement learning, which makes it a bit difficult for me to properly evaluate the paper's contributions, but to me it seems like fairly straightforward method modifications? To use ensembles of Gaussian models (instead of ensembles of models trained using the L2 loss) makes sense. The BIV method I didn't quite get, it seems rather ad hoc? I also don't quite get exactly how it's used in equation (10), is the ensemble of Gaussian models trained _jointly_ using this loss? I don't really know if this could be useful outside of RL.
[21-04-09] [paper127]
[20-02-13] [paper86]
[20-02-08] [paper85]
[19-11-29] [paper67]
[19-11-22] [paper65]
[19-02-05] [paper38]



System Identification:

[19-11-26] [paper66]
[19-10-28] [paper64]



Energy-Based Models:

[22-02-11] [paper176]
Well-written and quite interesting paper. A short paper, just 4 pages. They don't study the method from the "Energy-based Out-of-distribution Detection" paper as I had expected, but it was still a quite interesting read. The results in Section 4.2 seem interesting, especially for experiment 3, but I'm not sure that I properly understand everything.
[22-02-09] [paper174]
Interesting and well-written paper. The proposed method is quite clearly explained and makes intuitive sense (at least if you're familiar with EBMs). Compared to using the softmax score, the performance does seem to improve consistently. Seems like fine-tuning on an "auxiliary outlier dataset" is required to get really good performance though, which you can't really assume to have access to in real-world problems, I suppose?
[21-03-26] [paper125]
  • Your GAN is Secretly an Energy-based Model and You Should use Discriminator Driven Latent Sampling [pdf] [pdf with comments]
  • Tong Che, Ruixiang Zhang, Jascha Sohl-Dickstein, Hugo Larochelle, Liam Paull, Yuan Cao, Yoshua Bengio
  • 2020-03-12, NeurIPS 2020
  • [Energy-Based Models]
[21-01-29] [paper117]
  • No MCMC for Me: Amortized Sampling for Fast and Stable Training of Energy-Based Models [pdf] [code] [pdf with comments]
  • Will Grathwohl, Jacob Kelly, Milad Hashemi, Mohammad Norouzi, Kevin Swersky, David Duvenaud
  • 2020-10-08, ICLR 2021
  • [Energy-Based Models]
[20-11-13] [paper110]
[20-09-04] [paper103]
[20-06-18] [paper102]
[20-01-20] [paper82]
[20-01-16] [paper80]
[20-01-15] [paper79]
[20-01-14] [paper78]
[20-01-06] [paper75]
[19-12-22] [paper74]
[19-12-20] [paper73]
[19-12-20] [paper72]
[19-12-19] [paper71]
[19-12-15] [paper70]
[19-12-14] [paper69]
[19-12-13] [paper68]



Ensembling:

[21-04-01] [paper126]
[20-05-27] [paper99]
[20-03-27] [paper94]
[20-02-28] [paper91]
[19-02-05] [paper38]
[18-11-16] [paper20]
  • Uncertainty Estimates and Multi-Hypotheses Networks for Optical Flow [pdf] [pdf with comments] [summary]
  • Eddy Ilg, Özgün Çiçek, Silvio Galesso, Aaron Klein, Osama Makansi, Frank Hutter, Thomas Brox
  • 2018-08-06, ECCV2018
[18-11-12] [paper18]
  • Large-Scale Visual Active Learning with Deep Probabilistic Ensembles [pdf] [pdf with comments] [summary]
  • Kashyap Chitta, Jose M. Alvarez, Adam Lesnikowski
  • 2018-11-08
[18-10-19] [paper14]
  • Uncertainty in Neural Networks: Bayesian Ensembling [pdf] [pdf with comments] [summary]
  • Tim Pearce, Mohamed Zaki, Alexandra Brintrup, Andy Neel
  • 2018-10-12, AISTATS2019 submission
[18-10-18] [paper13]
  • Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles [pdf] [pdf with comments] [summary]
  • Balaji Lakshminarayanan, Alexander Pritzel, Charles Blundell
  • 2017-11-17, NeurIPS2017
[18-09-25] [paper5]
  • Deep Confidence: A Computationally Efficient Framework for Calculating Reliable Errors for Deep Neural Networks [pdf] [pdf with comments] [summary]
  • Isidro Cortes-Ciriano, Andreas Bender
  • 2018-09-24



Stochastic Gradient MCMC:

[20-04-17] [paper97]
[20-03-27] [paper94]
[19-04-05] [paper53]
  • Stochastic Gradient Descent as Approximate Bayesian Inference [pdf] [pdf with comments] [comments]
  • Stephan Mandt, Matthew D. Hoffman, David M. Blei
  • 2017-04-13, Journal of Machine Learning Research 18 (2017)
[19-02-13] [paper42]
[19-01-26] [paper35]
  • Learning Weight Uncertainty with Stochastic Gradient MCMC for Shape Classification [pdf] [poster] [pdf with comments] [comments]
  • Chunyuan Li, Andrew Stevens, Changyou Chen, Yunchen Pu, Zhe Gan, Lawrence Carin
  • CVPR2016
[19-01-25] [paper34]
[19-01-25] [paper33]
[19-01-24] [paper32]
  • Tutorial: Introduction to Stochastic Gradient Markov Chain Monte Carlo Methods [pdf] [pdf with comments]
  • Changyou Chen
  • 2016-08-10
[19-01-23] [paper30]
[19-01-23] [paper29]



Variational Inference:

[20-06-05] [paper100]
[20-01-08] [paper76]
[19-02-07] [paper40]
[19-01-28] [paper37]
[19-01-27] [paper36]



Neural Processes:

[21-05-07] [paper130]
[20-02-21] [paper90]
[18-09-30] [paper8]
  • Neural Processes [pdf] [pdf with comments] [summary]
  • Marta Garnelo, Jonathan Schwarz, Dan Rosenbaum, Fabio Viola, Danilo J. Rezende, S.M. Ali Eslami, Yee Whye Teh
  • 2018-07-04, ICML2018 Workshop
[18-09-27] [paper7]
  • Conditional Neural Processes [pdf] [pdf with comments] [summary]
  • Marta Garnelo, Dan Rosenbaum, Chris J. Maddison, Tiago Ramalho, David Saxton, Murray Shanahan, Yee Whye Teh, Danilo J. Rezende, S. M. Ali Eslami
  • 2018-07-04, ICML2018



Neural ODEs:

[21-04-29] [paper129]
[21-03-04] [paper122]
[20-12-18] [paper114]
  • Score-Based Generative Modeling through Stochastic Differential Equations [pdf] [code] [pdf with comments]
  • Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, Ben Poole
  • 2020-11-26, ICLR 2021
  • [Neural ODEs]
[20-12-14] [paper113]
[18-12-12] [paper26]



Transformers:

[22-03-03] [paper197]
Quite interesting and well-written paper. I did however find it difficult to properly understand everything, it feels like a lot of details are omitted (I wouldn't really know how to actually implement this in practice). It's difficult for me to judge how impressive the results are or how practically useful this approach actually might be, what limitations are there? Overall though, it does indeed seem quite interesting.
[21-05-07] [paper130]
[21-01-15] [paper115]
  • Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention [pdf] [pdf with comments]
  • Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, François Fleuret
  • 2020-06-29, ICML 2020
  • [Transformers]
[20-11-27] [paper112]
  • Rethinking Attention with Performers [pdf] [pdf with comments]
  • Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, Afroz Mohiuddin, Lukasz Kaiser, David Belanger, Lucy Colwell, Adrian Weller
  • 2020-10-30, ICLR 2021
  • [Transformers]



Implicit Neural Representations:

[22-02-24] [paper191]
Interesting and well-written paper. I wasn't very familiar with CT image reconstruction, but they do a good job explaining everything. Interesting that MC-dropout seems important for getting well-calibrated predictions.
[21-10-12] [paper162]
Interesting and very well-written paper, I really enjoyed reading it! The paper also gives a good understanding of neural implicit representations in general.
[21-10-03] [paper155]
Interesting and very well-written paper, I really enjoyed reading it. Interesting combination of implicit representations and 3D human modelling. The "inclusive human modelling" application is neat and important.
[21-10-03] [paper154]
Well-written and interesting paper, I enjoyed reading it. Neat application of implicit representations. The paper also gives a quite good overview of online 3D reconstruction in general.
[21-10-01] [paper152]
Well-written and quite interesting paper. Interesting application, being able to reconstruct full 3D scenes from sparse point clouds. I didn't fully understand everything, as I don't have a particularly strong graphics background.
[21-09-15] [paper146]
Extremely well-written and interesting paper. I really enjoyed reading it, and I would recommend anyone interested in computer vision to read it as well.

All parts of the proposed method are clearly explained and relatively easy to understand, including the volume rendering techniques which I was unfamiliar with.
[21-08-27] [paper142]



SysCon Deep Learning Reading Group:

Reading Group Papers in 2020:

[2020 w.42] [20-10-16] [paper108]
[2020 w.41] [20-10-09] [paper107]
[2020 w.39] [20-09-24] [paper106]
[2020 w.38] [20-09-21] [paper105]
[2020 w.37] [20-09-11] [paper104]
  • Gated Linear Networks [pdf] [pdf with comments] [comments]
  • Joel Veness, Tor Lattimore, David Budden, Avishkar Bhoopchand, Christopher Mattern, Agnieszka Grabska-Barwinska, Eren Sezener, Jianan Wang, Peter Toth, Simon Schmitt, Marcus Hutter
  • 2020-06-11
[2020 w.36] [20-09-04] [paper103]
[2020 w.25] [20-06-18] [paper102]
[2020 w.24] [20-06-12] [paper101]
[2020 w.23] [20-06-05] [paper100]
[2020 w.22] [20-05-27] [paper99]
[2020 w.20] [19-12-22] [paper74]
[2020 w.19] [20-05-10] [paper98]
[2020 w.16] [20-04-17] [paper97]
[2020 w.15] [20-04-09] [paper96]
[2020 w.14] [20-04-03] [paper95]
[2020 w.13] [20-03-27] [paper94]
[2020 w.12] [20-03-26] [paper93]
[2020 w.10] [20-03-09] [paper92]
[2020 w.9] [20-02-28] [paper91]
[2020 w.8] [20-02-21] [paper90]
[2020 w.7] [20-02-13] [paper86]
[2020 w.6] [20-02-08] [paper85]
[2020 w.5] [20-01-31] [paper84]
[2020 w.4] [20-01-24] [paper83]
[2020 w.3] [20-01-17] [paper81]
[2020 w.2] [20-01-10] [paper77]

Reading Group Papers in 2019:

[2019 w.48] [19-11-29] [paper67]
[2019 w.47] [19-11-22] [paper65]
[2019 w.46] [19-10-28] [paper64]
[2019 w.45] [19-01-27] [paper36]
[2019 w.43] [18-09-27] [paper6]
[2019 w.42] [19-10-18] [paper63]
  • Improving Variational Inference with Inverse Autoregressive Flow [pdf] [code] [pdf with comments] [comments]
  • Diederik P. Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, Max Welling
  • 2016-06-15, NeurIPS2016
[2019 w.41] [19-10-11] [paper62]
[2019 w.40] [19-10-04] [paper61]
[2019 w.23] [19-06-05] [paper55]
  • LaserNet: An Efficient Probabilistic 3D Object Detector for Autonomous Driving [pdf] [pdf with comments] [comments]
  • Gregory P. Meyer, Ankit Laddha, Eric Kee, Carlos Vallespi-Gonzalez, Carl K. Wellington
  • 2019-03-20, CVPR2019
[2019 w.22] [19-05-29] [paper54]
  • Attention Is All You Need [pdf] [pdf with comments] [comments]
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin
  • 2017-06-12, NeurIPS2017
[2019 w.18] [19-02-17] [paper44]
[2019 w.14] [19-04-05] [paper53]
  • Stochastic Gradient Descent as Approximate Bayesian Inference [pdf] [pdf with comments] [comments]
  • Stephan Mandt, Matthew D. Hoffman, David M. Blei
  • 2017-04-13, Journal of Machine Learning Research 18 (2017)
[2019 w.13] [19-03-29] [paper52]
  • Generating High Fidelity Images with Subscale Pixel Networks and Multidimensional Upscaling [pdf] [pdf with comments] [comments]
  • Jacob Menick, Nal Kalchbrenner
  • 2018-12-04, ICLR2019
[2019 w.12] [19-02-25] [paper46]
  • Evaluating model calibration in classification [pdf] [code] [pdf with comments] [comments]
  • Juozas Vaicenavicius, David Widmann, Carl Andersson, Fredrik Lindsten, Jacob Roll, Thomas B. Schön
  • 2019-02-19, AISTATS2019
[2019 w.11] [19-03-15] [paper51]
[2019 w.10] [19-03-04] [paper49]
[2019 w.9] [19-03-01] [paper48]
[2019 w.8] [19-02-22] [paper45]
  • Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks [pdf] [pdf with comments] [comments]
  • Sanjeev Arora, Simon S. Du, Wei Hu, Zhiyuan Li, Ruosong Wang
  • 2019-01-24
[2019 w.7] [19-02-14] [paper43]
[2019 w.6] [19-02-05] [paper38]
[2019 w.5] [19-01-25] [paper33]
[2019 w.4] [19-01-24] [paper31]
[2019 w.3] [19-01-17] [paper28]
[2019 w.2] [18-09-30] [paper8]
  • Neural Processes [pdf] [pdf with comments] [summary]
  • Marta Garnelo, Jonathan Schwarz, Dan Rosenbaum, Fabio Viola, Danilo J. Rezende, S.M. Ali Eslami, Yee Whye Teh
  • 2018-07-04, ICML2018 Workshop

Reading Group Papers in 2018:

[2018 w.50] [18-12-12] [paper26]
[2018 w.49] [18-11-29] [paper23]
[2018 w.48] [18-11-22] [paper22]
  • A Probabilistic U-Net for Segmentation of Ambiguous Images [pdf] [code] [pdf with comments] [summary]
  • Simon A. A. Kohl, Bernardino Romera-Paredes, Clemens Meyer, Jeffrey De Fauw, Joseph R. Ledsam, Klaus H. Maier-Hein, S. M. Ali Eslami, Danilo Jimenez Rezende, Olaf Ronneberger
  • 2018-10-29, NeurIPS2018
[2018 w.47] [18-11-22] [paper21]
  • When Recurrent Models Don't Need To Be Recurrent (a.k.a. Stable Recurrent Models) [pdf] [pdf with comments] [summary]
  • John Miller, Moritz Hardt
  • 2018-05-29, ICLR2019
[2018 w.46] [18-11-15] [paper19]
  • Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) [pdf] [pdf with comments] [summary]
  • Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, Rory Sayres
  • 2018-06-07, ICML2018
[2018 w.45] [18-11-08] [paper17]
[2018 w.44] [18-09-27] [paper7]
  • Conditional Neural Processes [pdf] [pdf with comments] [summary]
  • Marta Garnelo, Dan Rosenbaum, Chris J. Maddison, Tiago Ramalho, David Saxton, Murray Shanahan, Yee Whye Teh, Danilo J. Rezende, S. M. Ali Eslami
  • 2018-07-04, ICML2018
[2018 w.43] [18-10-25] [paper15]
  • Bayesian Convolutional Neural Networks with Many Channels are Gaussian Processes [pdf] [pdf with comments] [summary]
  • Roman Novak, Lechao Xiao, Jaehoon Lee, Yasaman Bahri, Daniel A. Abolafia, Jeffrey Pennington, Jascha Sohl-Dickstein
  • 2018-10-11, ICLR2019
[2018 w.41] [18-10-04] [paper9]
  • On gradient regularizers for MMD GANs [pdf] [pdf with comments] [summary]
  • Michael Arbel, Dougal J. Sutherland, Mikołaj Bińkowski, Arthur Gretton
  • 2018-05-29, NeurIPS2018
[2018 w.39] [18-09-27] [paper6]
[2018 w.38] [18-09-20] [paper1]
  • Gaussian Process Behaviour in Wide Deep Neural Networks [pdf] [pdf with comments] [summary]
  • Alexander G. de G. Matthews, Mark Rowland, Jiri Hron, Richard E. Turner, Zoubin Ghahramani
  • 2018-08-16, ICLR2018



SysCon Monte Carlo Reading Group:

[2019 w.6 II]
  • The Continuous-Discrete Time Feedback Particle Filter [pdf]
  • Tao Yang, Henk A. P. Blom, Prashant G. Mehta
  • 2014, American Control Conference
[2019 w.6 I]
  • Feedback Particle Filter [pdf]
  • Tao Yang, Prashant G. Mehta, Sean P. Meyn
  • 2013, IEEE Transactions on Automatic Control
[2019 w.3]
  • Markov Chains for Exploring Posterior Distributions [pdf] [pdf with comments]
  • Luke Tierney
  • 1994-12, The Annals of Statistics
[2018 w.50 II]
  • Particle Gibbs with Ancestor Sampling [pdf]
  • Fredrik Lindsten, Michael I. Jordan, Thomas B. Schön
  • 2014-06-14, Journal of Machine Learning Research
[2018 w.50 I]
  • Particle Markov chain Monte Carlo methods [pdf]
  • Christophe Andrieu, Arnaud Doucet, Roman Holenstein
  • 2010, Journal of the Royal Statistical Society
[2018 w.48]
  • State Space LSTM Models with Particle MCMC Inference [pdf]
  • Xun Zheng, Manzil Zaheer, Amr Ahmed, Yuan Wang, Eric P Xing, Alexander J Smola
  • 2017-11-30
[2018 w.46]
  • Rethinking the Effective Sample Size [pdf]
  • Víctor Elvira, Luca Martino, Christian P. Robert
  • `2018-09-11,



NeurIPS:

NeurIPS 2021:

[22-02-25] [paper192]
Quite interesting and well-written paper. The definition of "prediction depth" in Section 2.1 makes sense, and it definitely seems reasonable that this could correlate with example difficulty / prediction confidence in some way. Section 3 and 4, and all the figures, contain a lot of info it seems, I'd probably need to read the paper again to properly understand/appreciate everything.
[22-02-14] [paper179]
Interesting and very well-written paper, I enjoyed reading it. I still think that ensembling probably is quite difficult to beat purely in terms of uncertainty estimation quality, but this definitely seems like a useful tool in many situations. It's not clear to me if the analytical expression for regression in "4. Approximate Predictive Distribution" is applicable also if the variance is input-dependent?
[21-12-09] [paper171]
Quite interesting and well-written paper. Quite a heavy read, probably need to be rather familiar with GPs to properly understand/appreciate everything. Definitely check Appendix D, it gives a better understanding of how the proposed method is applied in practice. I'm not quite sure how strong/impressive the experimental results actually are. Also seems like the method could be a bit inconvenient to implement/use?
[21-12-03] [paper170]
Interesting and very well-written paper. Gives a good overview of the field and contains a lot of seemingly useful references. The evaluation is very comprehensive. The user study is quite neat.
[21-12-02] [paper169]
Quite well-written paper overall that seemed interesting, but I found it very difficult to properly understand everything. Thus, I can't really tell how interesting/significant their analysis actually is.
[21-11-25] [paper168]
Quite interesting and well-written paper. The experimental results do seem promising. However, I don't quite get why the proposed method intuitively makes sense, why is it better to only use the parameters of the final network layer?
[21-09-08] [paper145]
  • Revisiting the Calibration of Modern Neural Networks [pdf] [code] [annotated pdf]
  • Matthias Minderer, Josip Djolonga, Rob Romijnders, Frances Hubis, Xiaohua Zhai, Neil Houlsby, Dustin Tran, Mario Lucic
  • 2021-06-15, NeurIPS 2021
  • [Uncertainty Estimation]
Well-written paper. Everything is quite clearly explained and easy to understand. Quite enjoyable to read overall. 

Thorough experimental evaluation. Quite interesting findings.

NeurIPS 2020:

[22-02-12] [paper177]
Well-written and interesting paper. This is a good paper to read before "Natural Posterior Network: Deep Bayesian Predictive Uncertainty for Exponential Family Distributions". Their proposed method seems to have similar / slightly worse performance than a small ensemble, so the only real advantage is that it's faster at time-time? This is of course very important in many applications, but not in all. The performance also seems quite sensitive to the choice of lambda in the combined loss function (Equation (10)), according to Figure S2 in the appendix?
[22-02-09] [paper174]
Interesting and well-written paper. The proposed method is quite clearly explained and makes intuitive sense (at least if you're familiar with EBMs). Compared to using the softmax score, the performance does seem to improve consistently. Seems like fine-tuning on an "auxiliary outlier dataset" is required to get really good performance though, which you can't really assume to have access to in real-world problems, I suppose?
[21-10-12] [paper162]
Interesting and very well-written paper, I really enjoyed reading it! The paper also gives a good understanding of neural implicit representations in general.
[21-06-14] [paper132]
  • 3D Multi-bodies: Fitting Sets of Plausible 3D Human Models to Ambiguous Image Data [pdf] [annotated pdf]
  • Benjamin Biggs, Sébastien Ehrhadt, Hanbyul Joo, Benjamin Graham, Andrea Vedaldi, David Novotny
  • 2020-11-02, NeurIPS 2020
  • [3D Human Pose Estimation]
[21-03-26] [paper125]
  • Your GAN is Secretly an Energy-based Model and You Should use Discriminator Driven Latent Sampling [pdf] [pdf with comments]
  • Tong Che, Ruixiang Zhang, Jascha Sohl-Dickstein, Hugo Larochelle, Liam Paull, Yuan Cao, Yoshua Bengio
  • 2020-03-12, NeurIPS 2020
  • [Energy-Based Models]
[21-03-12] [paper123]
  • Unsupervised Learning of Visual Features by Contrasting Cluster Assignments [pdf] [code] [pdf with comments]
  • Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, Armand Joulin
  • 2020-06-17, NeurIPS 2020
[20-12-14] [paper113]
  • Dissecting Neural ODEs [pdf] [pdf with comments]
  • Stefano Massaroli, Michael Poli, Jinkyoo Park, Atsushi Yamashita, Hajime Asama
  • 2020-02-19, NeurIPS 2020
[20-09-24] [paper106]

NeurIPS 2019:

[20-11-06] [paper109]
[20-04-09] [paper96]
[20-01-31] [paper84]
[20-01-24] [paper83]
[20-01-15] [paper79]
[20-01-08] [paper76]
[19-12-15] [paper70]
[19-12-14] [paper69]

NeurIPS 2018:

[22-02-19] [paper187]
Well-written and interesting paper. The proposed method is simple and really neat: fit class-conditional Gaussians in the feature space of a pre-trained classifier (basically just LDA on the feature vectors), and then use the Mahalanobis distance to these Gaussians as the confidence score for input x. They then also do this for the features at multiple levels of the network and combine these confidence scores into one. I don't quite get why the "input pre-processing" in Section 2.2 (adding noise to test samples) works, in Table 1 it significantly improves the performance.
[19-03-04] [paper49]
[19-02-27] [paper47]
[19-02-17] [paper44]
[19-02-05] [paper38]
[19-01-17] [paper28]
[19-01-09] [paper27]
  • Relaxed Softmax: Efficient Confidence Auto-Calibration for Safe Pedestrian Detection [pdf] [poster] [pdf with comments] [summary]
  • Lukas Neumann, Andrew Zisserman, Andrea Vedaldi
  • 2018-11-29, NeurIPS2018 Workshop
[18-12-12] [paper26]
[18-11-29] [paper23]
[18-11-22] [paper22]
  • A Probabilistic U-Net for Segmentation of Ambiguous Images [pdf] [code] [pdf with comments] [summary]
  • Simon A. A. Kohl, Bernardino Romera-Paredes, Clemens Meyer, Jeffrey De Fauw, Joseph R. Ledsam, Klaus H. Maier-Hein, S. M. Ali Eslami, Danilo Jimenez Rezende, Olaf Ronneberger
  • 2018-10-29, NeurIPS2018
[18-10-04] [paper9]
  • On gradient regularizers for MMD GANs [pdf] [pdf with comments] [summary]
  • Michael Arbel, Dougal J. Sutherland, Mikołaj Bińkowski, Arthur Gretton
  • 2018-05-29, NeurIPS2018

NeurIPS 2017:

[20-01-10] [paper77]
[19-05-29] [paper54]
  • Attention Is All You Need [pdf] [pdf with comments] [comments]
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin
  • 2017-06-12, NeurIPS2017
[18-10-18] [paper13]
  • Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles [pdf] [pdf with comments] [summary]
  • Balaji Lakshminarayanan, Alexander Pritzel, Charles Blundell
  • 2017-11-17, NeurIPS2017
[18-09-24] [paper2]

NeurIPS 2016:

[19-10-18] [paper63]
  • Improving Variational Inference with Inverse Autoregressive Flow [pdf] [code] [pdf with comments] [comments]
  • Diederik P. Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, Max Welling
  • 2016-06-15, NeurIPS2016

NeurIPS 2015:

[19-02-12] [paper41]
[19-01-25] [paper33]

NeurIPS 2011:

[19-01-28] [paper37]



ICML:

ICML 2021:

[21-09-21] [paper148]
Very well-written and quite interesting paper, I enjoyed reading it. Everything is quite well-explained, it's relatively easy to follow. The paper provides a good overview of the out-of-distribution detection problem and current methods.
[21-09-02] [paper144]
  • Differentiable Particle Filtering via Entropy-Regularized Optimal Transport [pdf] [code] [annotated pdf]
  • Adrien Corenflos, James Thornton, George Deligiannidis, Arnaud Doucet
  • 2021-02-15, ICML 2021
[21-05-07] [paper130]
[21-04-01] [paper126]
[21-02-19] [paper120]
  • Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision [pdf] [pdf with comments]
  • Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig
  • 2021-02-11, ICML 2021

ICML 2020:

[21-10-14] [paper163]
  • Learning to Simulate Complex Physics with Graph Networks [pdf] [code] [annotated pdf]
  • Alvaro Sanchez-Gonzalez, Jonathan Godwin, Tobias Pfaff, Rex Ying, Jure Leskovec, Peter W. Battaglia
  • 2020-02-21, ICML 2020
Quite well-written and somewhat interesting paper. Cool application and a bunch of neat videos. This is not really my area, so I didn't spend too much time/energy trying to fully understand everything.
[21-01-15] [paper115]
  • Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention [pdf] [pdf with comments]
  • Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, François Fleuret
  • 2020-06-29, ICML 2020
  • [Transformers]
[20-09-21] [paper105]
[20-06-05] [paper100]

ICML 2019:

[20-02-14] [paper87]
[19-11-22] [paper65]

ICML 2018:

[21-02-26] [paper121]
  • Neural Relational Inference for Interacting Systems [pdf] [code] [pdf with comments]
  • Thomas Kipf, Ethan Fetaya, Kuan-Chieh Wang, Max Welling, Richard Zemel
  • 2018-02-13, ICML 2018
[20-02-13] [paper86]
[19-02-07] [paper40]
[18-11-15] [paper19]
  • Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) [pdf] [pdf with comments] [summary]
  • Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, Rory Sayres
  • 2018-06-07, ICML2018
[18-10-18] [paper12]
  • Reliable Uncertainty Estimates in Deep Neural Networks using Noise Contrastive Priors [pdf] [pdf with comments] [summary]
  • Danijar Hafner, Dustin Tran, Alex Irpan, Timothy Lillicrap, James Davidson
  • 2018-07-24, ICML2018 Workshop
[18-09-30] [paper8]
  • Neural Processes [pdf] [pdf with comments] [summary]
  • Marta Garnelo, Jonathan Schwarz, Dan Rosenbaum, Fabio Viola, Danilo J. Rezende, S.M. Ali Eslami, Yee Whye Teh
  • 2018-07-04, ICML2018 Workshop
[18-09-27] [paper7]
  • Conditional Neural Processes [pdf] [pdf with comments] [summary]
  • Marta Garnelo, Dan Rosenbaum, Chris J. Maddison, Tiago Ramalho, David Saxton, Murray Shanahan, Yee Whye Teh, Danilo J. Rezende, S. M. Ali Eslami
  • 2018-07-04, ICML2018
[18-09-27] [paper6]

ICML 2017:

[18-12-05] [paper24]

ICML 2015:

[19-10-11] [paper62]
[19-02-06] [paper39]
  • Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks [pdf] [pdf with comments] [comments]
  • José Miguel Hernández-Lobato, Ryan P. Adams
  • 2015-07-15, ICML2015
[19-01-27] [paper36]

ICML 2014:

[19-01-23] [paper30]

ICML 2011:

[19-01-23] [paper29]



ICLR:

ICLR 2022:

[22-03-03] [paper197]
Quite interesting and well-written paper. I did however find it difficult to properly understand everything, it feels like a lot of details are omitted (I wouldn't really know how to actually implement this in practice). It's difficult for me to judge how impressive the results are or how practically useful this approach actually might be, what limitations are there? Overall though, it does indeed seem quite interesting.
[22-02-15] [paper182]
Well-written and somewhat interesting paper. I'm not overly familiar with RL, which makes it a bit difficult for me to properly evaluate the paper's contributions. They use standard ensembles for uncertainty estimation combined with an OOD sampling regularization. I thought that the OOD sampling could be interesting, but it seems very specific to RL. I'm sure this paper is quite interesting for people doing RL, but I don't think it's overly useful for me.
[22-02-15] [paper181]
Quite interesting and very well-written paper, I enjoyed reading it. Their analysis of fitting Gaussian regression models via the NLL is quite interesting, I didn't really expect to learn something new about this. I've seen Gaussian models outperform standard regression (L2 loss) w.r.t. accuracy in some applications/datasets, and it being the other way around in others. In the first case, I've then attributed the success of the Gaussian model to the "learned loss attenuation". The analysis in this paper could perhaps explain why you get this performance boost only in certain applications. Their beta-NLL loss could probably be quite useful, seems like a convenient tool to have.
[22-02-15] [paper180]
Well-written and somewhat interesting paper. I'm not overly familiar with reinforcement learning, which makes it a bit difficult for me to properly evaluate the paper's contributions, but to me it seems like fairly straightforward method modifications? To use ensembles of Gaussian models (instead of ensembles of models trained using the L2 loss) makes sense. The BIV method I didn't quite get, it seems rather ad hoc? I also don't quite get exactly how it's used in equation (10), is the ensemble of Gaussian models trained _jointly_ using this loss? I don't really know if this could be useful outside of RL.
[22-02-10] [paper175]
Interesting and well-written paper. I didn't quite understand all the details, I'll have to read a couple of related/background papers to be able to properly appreciate and evaluate the proposed method. I definitely feel like I would like to read up on this family of methods. Extensive experimental evaluation, and the results seem promising overall.
[22-02-09] [paper173]
Interesting and quite well-written paper. I did find it somewhat difficult to understand certain parts though, they could perhaps be explained more clearly. The results seem quite impressive (they do consistently outperform all baselines), but I find it interesting that the "Gaussian noise" baseline in Table 2 performs that well? I should probably have read "Energy-based Out-of-distribution Detection" before reading this paper.
[21-12-16] [paper172]
Very interesting and quite well-written paper. Kind of neat/fun to see state-space models being used. The experimental results seem very impressive!? I didn't fully understand everything in Section 3. I had to read Section 3.4 a couple of times to understand how the parameterization actually works in practice (you have H state-space models, one for each feature dimension, so that you can map a sequence of feature vectors to another sequence of feature vectors) (and you can then also have multiple such layers of state-space models, mapping sequence --> sequence --> sequence --> ....).

ICLR 2021:

[21-04-16] [paper128]
  • Learning Mesh-Based Simulation with Graph Networks [pdf] [code] [annotated pdf]
  • Tobias Pfaff, Meire Fortunato, Alvaro Sanchez-Gonzalez, Peter W. Battaglia
  • 2020-10-07, ICLR 2021
[21-03-19] [paper124]
[21-02-12] [paper119]
[21-01-29] [paper117]
  • No MCMC for Me: Amortized Sampling for Fast and Stable Training of Energy-Based Models [pdf] [code] [pdf with comments]
  • Will Grathwohl, Jacob Kelly, Milad Hashemi, Mohammad Norouzi, Kevin Swersky, David Duvenaud
  • 2020-10-08, ICLR 2021
  • [Energy-Based Models]
[21-01-22] [paper116]
[20-12-18] [paper114]
  • Score-Based Generative Modeling through Stochastic Differential Equations [pdf] [code] [pdf with comments]
  • Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, Ben Poole
  • 2020-11-26, ICLR 2021
  • [Neural ODEs]
[20-11-27] [paper112]
  • Rethinking Attention with Performers [pdf] [pdf with comments]
  • Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, Afroz Mohiuddin, Lukasz Kaiser, David Belanger, Lucy Colwell, Adrian Weller
  • 2020-10-30, ICLR 2021
  • [Transformers]
[20-11-23] [paper111]
[20-11-13] [paper110]

ICLR 2020:

[20-05-27] [paper99]
[20-03-27] [paper94]
[20-03-26] [paper93]
[20-02-21] [paper90]
[20-01-17] [paper81]
[19-12-22] [paper74]

ICLR 2019:

[19-10-04] [paper61]
[19-03-29] [paper52]
  • Generating High Fidelity Images with Subscale Pixel Networks and Multidimensional Upscaling [pdf] [pdf with comments] [comments]
  • Jacob Menick, Nal Kalchbrenner
  • 2018-12-04, ICLR2019
[19-01-25] [paper34]
[18-11-22] [paper21]
  • When Recurrent Models Don't Need To Be Recurrent (a.k.a. Stable Recurrent Models) [pdf] [pdf with comments] [summary]
  • John Miller, Moritz Hardt
  • 2018-05-29, ICLR2019
[18-11-08] [paper17]
[18-10-25] [paper15]
  • Bayesian Convolutional Neural Networks with Many Channels are Gaussian Processes [pdf] [pdf with comments] [summary]
  • Roman Novak, Lechao Xiao, Jaehoon Lee, Yasaman Bahri, Daniel A. Abolafia, Jeffrey Pennington, Jascha Sohl-Dickstein
  • 2018-10-11, ICLR2019

ICLR 2018:

[22-02-26] [paper194]
Quite interesting and well-written paper. Two simple modifications of the "maximum softmax score" baseline, and the performance is consistently improved. The input perturbation method is quite interesting. Intuitively, it's not entirely clear to me why it actually works.
[18-09-20] [paper1]
  • Gaussian Process Behaviour in Wide Deep Neural Networks [pdf] [pdf with comments] [summary]
  • Alexander G. de G. Matthews, Mark Rowland, Jiri Hron, Richard E. Turner, Zoubin Ghahramani
  • 2018-08-16, ICLR2018

ICLR 2017:

[19-03-15] [paper51]

ICLR 2014:

[19-03-11] [paper50]



CVPR:

CVPR 2021:

[21-10-08] [paper161]
Well-written and quite interesting paper. I read it mainly as background for "Hierarchical Kinematic Probability Distributions for 3D Human Shape and Pose Estimation from Images in the Wild" which is written by exactly the same authors. In this paper, they predict a single Gaussian distribution for the pose (instead of hierarchical matrix-Fisher distributions). Also, they mainly focus on the body shape. They also use silhouettes + 2D keypoint heatmaps as input (instead of edge-filters + 2D keypoint heatmaps).
[21-10-06] [paper157]
Well-written and interesting paper. Quite easy to read and follow, the method is clearly explained and makes intuitive sense.
[21-10-04] [paper156]
Well-written and fairly interesting paper. The marker-based representation, instead of using skeleton joints, makes sense. The recursive projection scheme also makes sense, but seems very slow (2.27 sec/frame)? I didn't quite get all the details for their DCT representation of the latent space.
[21-10-03] [paper154]
Well-written and interesting paper, I enjoyed reading it. Neat application of implicit representations. The paper also gives a quite good overview of online 3D reconstruction in general.
[21-06-18] [paper138]
[21-02-05] [paper118]

CVPR 2020:

[21-10-01] [paper152]
Well-written and quite interesting paper. Interesting application, being able to reconstruct full 3D scenes from sparse point clouds. I didn't fully understand everything, as I don't have a particularly strong graphics background.
[20-06-18] [paper102]
[19-12-20] [paper72]

CVPR 2019:

[21-08-27] [paper142]
[21-06-19] [paper141]
[21-06-19] [paper140]
  • Expressive Body Capture: 3D Hands, Face, and Body from a Single Image [pdf] [code] [annotated pdf]
  • Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed A. A. Osman, Dimitrios Tzionas, Michael J. Black
  • 2019-04-11, CVPR 2019
  • [3D Human Pose Estimation]
Very well-written and quite interesting paper. Gives a good understanding of the SMPL model and the SMPLify method.
[19-07-10] [paper59]
[19-06-12] [paper57]
[19-06-05] [paper55]
  • LaserNet: An Efficient Probabilistic 3D Object Detector for Autonomous Driving [pdf] [pdf with comments] [comments]
  • Gregory P. Meyer, Ankit Laddha, Eric Kee, Carlos Vallespi-Gonzalez, Carl K. Wellington
  • 2019-03-20, CVPR2019

CVPR 2018:

[21-06-15] [paper133]
[18-10-05] [paper11]
[18-10-04] [paper10]
[18-09-24] [paper3]

CVPR 2016:

[19-01-26] [paper35]
  • Learning Weight Uncertainty with Stochastic Gradient MCMC for Shape Classification [pdf] [poster] [pdf with comments] [comments]
  • Chunyuan Li, Andrew Stevens, Changyou Chen, Yunchen Pu, Zhe Gan, Lawrence Carin
  • CVPR2016



ECCV:

ECCV 2020:

[21-09-15] [paper146]
Extremely well-written and interesting paper. I really enjoyed reading it, and I would recommend anyone interested in computer vision to read it as well.

All parts of the proposed method are clearly explained and relatively easy to understand, including the volume rendering techniques which I was unfamiliar with.
[20-06-12] [paper101]

ECCV 2018:

[19-06-12] [paper56]
[18-11-16] [paper20]
  • Uncertainty Estimates and Multi-Hypotheses Networks for Optical Flow [pdf] [pdf with comments] [summary]
  • Eddy Ilg, Özgün Çiçek, Silvio Galesso, Aaron Klein, Osama Makansi, Frank Hutter, Thomas Brox
  • 2018-08-06, ECCV2018

ECCV 2016:

[21-06-18] [paper139]
  • Keep it SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image [pdf] [annotated pdf]
  • Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter Gehler, Javier Romero, Michael J. Black
  • 2016-07-27, ECCV 2016
  • [3D Human Pose Estimation]



ICCV:

ICCV 2021:

[21-10-07] [paper159]
Well-written and quite interesting paper. I didn't fully understand everything though, and it feels like I probably don't know this specific setting/problem well enough to fully appreciate the paper. 
[21-10-07] [paper158]
Well-written and very interesting paper, I enjoyed reading it. The hierarchical distribution prediction approach makes sense and consistently outperforms the independent baseline. Using matrix-Fisher distributions makes sense. The synthetic training framework and the input representation of edge-filters + 2D keypoint heatmaps are both interesting.
[21-10-03] [paper155]
Interesting and very well-written paper, I really enjoyed reading it. Interesting combination of implicit representations and 3D human modelling. The "inclusive human modelling" application is neat and important.
[21-10-02] [paper153]
Well-written and quite interesting paper. The main idea, using a learned conditional prior p(z|c) instead of just p(z), makes sense and was shown beneficial also in "HuMoR: 3D Human Motion Model for Robust Pose Estimation". I'm however somewhat confused by their specific implementation in Section 4, doesn't seem like a standard cVAE implementation?
[21-09-24] [paper150]
Well-written and fairly interesting paper. Quite a lot of details on the attention architecture, which I personally don't find overly interesting. The experimental results are quite impressive, but I would like to see a comparison in terms of computational cost at test-time. It sounds like their method is rather slow.
[21-09-23] [paper149]
Well-written and quite interesting paper. The general idea, refining frame-by-frame pose estimates via physical constraints, intuitively makes a lot of sense. I did however find it quite difficult to understand all the details in Section 3.
[21-09-17] [paper147]
Quite interesting paper, but also quite strange/confusing. I don't think the proposed method is explained particularly well, at least I found it quite difficult to properly understand what they actually are doing.

In the end it seems like they are learning a global loss function that is very similar to doing probabilistic regression with a Gauss/Laplace model of p(y|x) (with learned mean and variance)? See Figure 4 in the Appendix.

And while it's true that their performance is much better than for direct regression with an L2/L1 loss (see e.g. Table 1), they only compare with Gauss/Laplace probabilistic regression once (Table 7) and in that case the Laplace model is actually quite competitive?
[21-06-16] [paper134]
[21-06-04] [paper131]

ICCV 2019:

[21-06-17] [paper136]

ICCV 2017:

[21-06-16] [paper135]



BMVC:

BMVC 2020:

[21-10-08] [paper160]
Well-written and farily interesting paper. I read it mainly as background for "Hierarchical Kinematic Probability Distributions for 3D Human Shape and Pose Estimation from Images in the Wild" which is written by exactly the same authors. In this paper, they just use direct regression. They also use silhouettes + 2D keypoint heatmaps as input (instead of edge-filters + 2D keypoint heatmaps).



AISTATS:

AISTATS 2022:

[22-02-16] [paper184]
Interesting and well-written paper. The proposed method makes intuitive sense, trying to incorporate the "OOD training" method (i.e., to use some kind of OOD data during training, similar to e.g. the "Deep Anomaly Detection with Outlier Exposure" paper) into the Bayesian deep learning approach. The experimental results do seem quite promising.

AISTATS 2019:

[19-02-25] [paper46]
  • Evaluating model calibration in classification [pdf] [code] [pdf with comments] [comments]
  • Juozas Vaicenavicius, David Widmann, Carl Andersson, Fredrik Lindsten, Jacob Roll, Thomas B. Schön
  • 2019-02-19, AISTATS 2019

AISTATS 2010:

[20-01-14] [paper78]



AAAI:

AAAI 2022:

[22-02-26] [paper195]
Quite interesting and well-written paper. It seemed quite niche at first, but I think their analysis could potentially be useful.

AAAI 2020:

[19-12-19] [paper71]



MICCAI:

MICCAI 2020:

[22-02-21] [paper190]
Quite interesting and well-written paper. They compare MC-dropout, ensemlbing and mixup (and with a standard softmax classifer as the baseline). Nothing groundbreaking, but the studied application (classification of pathology slides for cancer) is very interesting. The FPR95 metrics for OOD detection in Table 4 are terrible for ensembling, but the classification accuracy (89.7) is also pretty much the same as for D_test_int in Tabe 3 (90.1)? So, it doesn't really matter that the model isn't capable of distinguishing this "OOD" data from in-distribution? 



CDC:

CDC 2018:

[19-10-28] [paper64]



JMLR:

[20-01-16] [paper80]



Papers by Year:

2022:

[22-03-02] [paper196]
Somewhat interesting paper. They use a softmax model with MC-dropout to compute uncertainty estimates. The evaluation is not very extensive, they mostly just check that the classification accuracy improves as they reject more and more samples based on a uncertainty threshold.
[22-02-26] [paper195]
Quite interesting and well-written paper. It seemed quite niche at first, but I think their analysis could potentially be useful.
[22-02-24] [paper191]
Interesting and well-written paper. I wasn't very familiar with CT image reconstruction, but they do a good job explaining everything. Interesting that MC-dropout seems important for getting well-calibrated predictions.
[22-02-21] [paper189]
Somewhat interesting paper. I didn't quite understand everything, so it could be more interesting than I think. The fact that their pseudo-input generation process "relies on the availability of a differentiable density estimate of the data" seems like a big limitation? For regression, they only applied their method to very low-dimensional input data (1D toy regression and UCI benchmarks), but would this work for image-based tasks?
[22-02-09] [paper173]
Interesting and quite well-written paper. I did find it somewhat difficult to understand certain parts though, they could perhaps be explained more clearly. The results seem quite impressive (they do consistently outperform all baselines), but I find it interesting that the "Gaussian noise" baseline in Table 2 performs that well? I should probably have read "Energy-based Out-of-distribution Detection" before reading this paper.

2021:

[22-03-03] [paper197]
Quite interesting and well-written paper. I did however find it difficult to properly understand everything, it feels like a lot of details are omitted (I wouldn't really know how to actually implement this in practice). It's difficult for me to judge how impressive the results are or how practically useful this approach actually might be, what limitations are there? Overall though, it does indeed seem quite interesting.
[22-02-25] [paper193]
Interesting and well-written paper. Interesting that Mahalanobis works very well on the CIFAR10 vs SVHN but not on the medical imaging dataset. I don't quite get how/why the ODIN method works, I'll probably have to read that paper.
[22-02-25] [paper192]
Quite interesting and well-written paper. The definition of "prediction depth" in Section 2.1 makes sense, and it definitely seems reasonable that this could correlate with example difficulty / prediction confidence in some way. Section 3 and 4, and all the figures, contain a lot of info it seems, I'd probably need to read the paper again to properly understand/appreciate everything.
[22-02-18] [paper185]
  • Does Your Dermatology Classifier Know What It Doesn't Know? Detecting the Long-Tail of Unseen Conditions [pdf] [annotated pdf]
  • Abhijit Guha Roy, Jie Ren, Shekoofeh Azizi, Aaron Loh, Vivek Natarajan, Basil Mustafa, Nick Pawlowski, Jan Freyberg, Yuan Liu, Zach Beaver, Nam Vo, Peggy Bui, Samantha Winter, Patricia MacWilliams, Greg S. Corrado, Umesh Telang, Yun Liu, Taylan Cemgil, Alan Karthikesalingam, Balaji Lakshminarayanan, Jim Winkens
  • 2021-04-08, Medical Image Analysis (January 2022)
  • [Out-of-Distribution Detection] [Medical ML]
Well-written and interesting paper. Quite long, so it took a bit longer than usual to read it. Section 1 and 2 gives a great overview of OOD detection in general, and how it can be used specifically in this dermatology setting. I can definitely recommend reading Section 2 (Related work). They assume access to some outlier data during training, so their approach is similar to the "Outlier exposure" method (specifically in this dermatology setting, they say that this is a fair assumption). Their method is an improvement of the "reject bucket" (add an extra class which you assign to all outlier training data points), in their proposed method they also use fine-grained classification of the outlier skin conditions. Then they also use an ensemble of 5 models, and also a more diverse ensemble (in which they combine models trained with different representation learning techniques). This diverse ensemble obtains the best performance.
[22-02-16] [paper184]
Interesting and well-written paper. The proposed method makes intuitive sense, trying to incorporate the "OOD training" method (i.e., to use some kind of OOD data during training, similar to e.g. the "Deep Anomaly Detection with Outlier Exposure" paper) into the Bayesian deep learning approach. The experimental results do seem quite promising.
[22-02-15] [paper183]
Well-written and interesting paper. Short paper of just 3 pages, but with an extensive appendix which I definitely recommend going through. The method, training an ensemble and then applying the Laplace approximation to each network, is very simple and intuitively makes a lot of sense. I didn't realize that this would have basically the same test-time speed as ensembling (since they utilize that probit approximation), that's very neat. It also seems to consistently outperform ensembling a bit across almost all tasks and metrics.
[22-02-15] [paper182]
Well-written and somewhat interesting paper. I'm not overly familiar with RL, which makes it a bit difficult for me to properly evaluate the paper's contributions. They use standard ensembles for uncertainty estimation combined with an OOD sampling regularization. I thought that the OOD sampling could be interesting, but it seems very specific to RL. I'm sure this paper is quite interesting for people doing RL, but I don't think it's overly useful for me.
[22-02-15] [paper181]
Quite interesting and very well-written paper, I enjoyed reading it. Their analysis of fitting Gaussian regression models via the NLL is quite interesting, I didn't really expect to learn something new about this. I've seen Gaussian models outperform standard regression (L2 loss) w.r.t. accuracy in some applications/datasets, and it being the other way around in others. In the first case, I've then attributed the success of the Gaussian model to the "learned loss attenuation". The analysis in this paper could perhaps explain why you get this performance boost only in certain applications. Their beta-NLL loss could probably be quite useful, seems like a convenient tool to have.
[22-02-15] [paper180]
Well-written and somewhat interesting paper. I'm not overly familiar with reinforcement learning, which makes it a bit difficult for me to properly evaluate the paper's contributions, but to me it seems like fairly straightforward method modifications? To use ensembles of Gaussian models (instead of ensembles of models trained using the L2 loss) makes sense. The BIV method I didn't quite get, it seems rather ad hoc? I also don't quite get exactly how it's used in equation (10), is the ensemble of Gaussian models trained _jointly_ using this loss? I don't really know if this could be useful outside of RL.
[22-02-14] [paper179]
Interesting and very well-written paper, I enjoyed reading it. I still think that ensembling probably is quite difficult to beat purely in terms of uncertainty estimation quality, but this definitely seems like a useful tool in many situations. It's not clear to me if the analytical expression for regression in "4. Approximate Predictive Distribution" is applicable also if the variance is input-dependent?
[22-02-12] [paper178]
Well-written and interesting paper. They synthetically create dataset shifts (e.g. by adding Gaussian noise to the data) of increasing intensity and study whether or not the uncertainty increases as the accuracy degrades. They compare regular softmax, temperature scaling, MC-dropout, ensembling and a simple variational inference method. Their conclusion is basically that ensembling slightly outperforms the other methods, but that no method performs overly well. I think these type of studies are really useful.
[22-02-11] [paper176]
Well-written and quite interesting paper. A short paper, just 4 pages. They don't study the method from the "Energy-based Out-of-distribution Detection" paper as I had expected, but it was still a quite interesting read. The results in Section 4.2 seem interesting, especially for experiment 3, but I'm not sure that I properly understand everything.
[22-02-10] [paper175]
Interesting and well-written paper. I didn't quite understand all the details, I'll have to read a couple of related/background papers to be able to properly appreciate and evaluate the proposed method. I definitely feel like I would like to read up on this family of methods. Extensive experimental evaluation, and the results seem promising overall.
[21-12-16] [paper172]
Very interesting and quite well-written paper. Kind of neat/fun to see state-space models being used. The experimental results seem very impressive!? I didn't fully understand everything in Section 3. I had to read Section 3.4 a couple of times to understand how the parameterization actually works in practice (you have H state-space models, one for each feature dimension, so that you can map a sequence of feature vectors to another sequence of feature vectors) (and you can then also have multiple such layers of state-space models, mapping sequence --> sequence --> sequence --> ....).
[21-12-09] [paper171]
Quite interesting and well-written paper. Quite a heavy read, probably need to be rather familiar with GPs to properly understand/appreciate everything. Definitely check Appendix D, it gives a better understanding of how the proposed method is applied in practice. I'm not quite sure how strong/impressive the experimental results actually are. Also seems like the method could be a bit inconvenient to implement/use?
[21-12-03] [paper170]
Interesting and very well-written paper. Gives a good overview of the field and contains a lot of seemingly useful references. The evaluation is very comprehensive. The user study is quite neat.
[21-12-02] [paper169]
Quite well-written paper overall that seemed interesting, but I found it very difficult to properly understand everything. Thus, I can't really tell how interesting/significant their analysis actually is.
[21-11-25] [paper168]
Quite interesting and well-written paper. The experimental results do seem promising. However, I don't quite get why the proposed method intuitively makes sense, why is it better to only use the parameters of the final network layer?
[21-11-18] [paper167]
  • Masked Autoencoders Are Scalable Vision Learners [pdf] [annotated pdf]
  • Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick
  • 2021-11-11
Interesting and well-written paper. The proposed method is simple and makes a lot of intuitive sense, which is rather satisfying. After page 4, there's mostly just detailed ablations and results.
[21-10-28] [paper165]
  • Deep Classifiers with Label Noise Modeling and Distance Awareness [pdf] [annotated pdf]
  • Vincent Fortuin, Mark Collier, Florian Wenzel, James Allingham, Jeremiah Liu, Dustin Tran, Balaji Lakshminarayanan, Jesse Berent, Rodolphe Jenatton, Effrosyni Kokiopoulou
  • 2021-10-06
  • [Uncertainty Estimation]
Quite interesting and well-written paper. I find the distance-awareness property more interesting than modelling of input/class-dependent label noise, so the proposed method (HetSNGP) is perhaps not overly interesting compared to the SNGP baseline.
[21-10-21] [paper164]
Somewhat interesting paper. The phenomena observed in Figure 1, that validation accuracy suddenly increases long after almost perfect fitting of the training data has been achieved is quite interesting. I didn't quite understand the datasets they use (binary operation tables).
[21-10-08] [paper161]
Well-written and quite interesting paper. I read it mainly as background for "Hierarchical Kinematic Probability Distributions for 3D Human Shape and Pose Estimation from Images in the Wild" which is written by exactly the same authors. In this paper, they predict a single Gaussian distribution for the pose (instead of hierarchical matrix-Fisher distributions). Also, they mainly focus on the body shape. They also use silhouettes + 2D keypoint heatmaps as input (instead of edge-filters + 2D keypoint heatmaps).
[21-10-07] [paper159]
Well-written and quite interesting paper. I didn't fully understand everything though, and it feels like I probably don't know this specific setting/problem well enough to fully appreciate the paper. 
[21-10-07] [paper158]
Well-written and very interesting paper, I enjoyed reading it. The hierarchical distribution prediction approach makes sense and consistently outperforms the independent baseline. Using matrix-Fisher distributions makes sense. The synthetic training framework and the input representation of edge-filters + 2D keypoint heatmaps are both interesting.
[21-10-06] [paper157]
Well-written and interesting paper. Quite easy to read and follow, the method is clearly explained and makes intuitive sense.
[21-10-03] [paper155]
Interesting and very well-written paper, I really enjoyed reading it. Interesting combination of implicit representations and 3D human modelling. The "inclusive human modelling" application is neat and important.
[21-09-23] [paper149]
Well-written and quite interesting paper. The general idea, refining frame-by-frame pose estimates via physical constraints, intuitively makes a lot of sense. I did however find it quite difficult to understand all the details in Section 3.
[21-09-21] [paper148]
Very well-written and quite interesting paper, I enjoyed reading it. Everything is quite well-explained, it's relatively easy to follow. The paper provides a good overview of the out-of-distribution detection problem and current methods.
[21-09-17] [paper147]
Quite interesting paper, but also quite strange/confusing. I don't think the proposed method is explained particularly well, at least I found it quite difficult to properly understand what they actually are doing.

In the end it seems like they are learning a global loss function that is very similar to doing probabilistic regression with a Gauss/Laplace model of p(y|x) (with learned mean and variance)? See Figure 4 in the Appendix.

And while it's true that their performance is much better than for direct regression with an L2/L1 loss (see e.g. Table 1), they only compare with Gauss/Laplace probabilistic regression once (Table 7) and in that case the Laplace model is actually quite competitive?
[21-09-08] [paper145]
  • Revisiting the Calibration of Modern Neural Networks [pdf] [code] [annotated pdf]
  • Matthias Minderer, Josip Djolonga, Rob Romijnders, Frances Hubis, Xiaohua Zhai, Neil Houlsby, Dustin Tran, Mario Lucic
  • 2021-06-15, NeurIPS 2021
  • [Uncertainty Estimation]
Well-written paper. Everything is quite clearly explained and easy to understand. Quite enjoyable to read overall. 

Thorough experimental evaluation. Quite interesting findings.
[21-09-02] [paper144]
  • Differentiable Particle Filtering via Entropy-Regularized Optimal Transport [pdf] [code] [annotated pdf]
  • Adrien Corenflos, James Thornton, George Deligiannidis, Arnaud Doucet
  • 2021-02-15, ICML 2021
[21-09-02] [paper143]
[21-06-16] [paper134]
[21-06-04] [paper131]
[21-05-07] [paper130]
[21-04-29] [paper129]
[21-04-01] [paper126]
[21-03-19] [paper124]
[21-03-04] [paper122]
[21-02-19] [paper120]
  • Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision [pdf] [pdf with comments]
  • Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig
  • 2021-02-11, ICML 2021
[21-02-12] [paper119]

2020:

[22-02-21] [paper190]
Quite interesting and well-written paper. They compare MC-dropout, ensemlbing and mixup (and with a standard softmax classifer as the baseline). Nothing groundbreaking, but the studied application (classification of pathology slides for cancer) is very interesting. The FPR95 metrics for OOD detection in Table 4 are terrible for ensembling, but the classification accuracy (89.7) is also pretty much the same as for D_test_int in Tabe 3 (90.1)? So, it doesn't really matter that the model isn't capable of distinguishing this "OOD" data from in-distribution? 
[22-02-19] [paper188]
  • Contrastive Training for Improved Out-of-Distribution Detection [pdf] [annotated pdf]
  • Jim Winkens, Rudy Bunel, Abhijit Guha Roy, Robert Stanforth, Vivek Natarajan, Joseph R. Ledsam, Patricia MacWilliams, Pushmeet Kohli, Alan Karthikesalingam, Simon Kohl, Taylan Cemgil, S. M. Ali Eslami, Olaf Ronneberger
  • 2020-07-10
  • [Out-of-Distribution Detection]
Quite interesting and very well-written paper. They take the method from the Mahalanobis paper ("A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks") (however, they fit Gaussians only to the features at the second-to-last network layer, and they don't use the input pre-processing either) and consistently improve OOD detection performance by incorporating contrastive training. Specifically, they first train the network using just the SimCLR loss for a large number of epochs, and then also add the standard classification loss. I didn't quite get why the label smoothing is necessary, but according to Table 2 it's responsible for a large portion of the performance gain.
[22-02-09] [paper174]
Interesting and well-written paper. The proposed method is quite clearly explained and makes intuitive sense (at least if you're familiar with EBMs). Compared to using the softmax score, the performance does seem to improve consistently. Seems like fine-tuning on an "auxiliary outlier dataset" is required to get really good performance though, which you can't really assume to have access to in real-world problems, I suppose?
[21-11-11] [paper166]
Quite well-written and somewhat interesting paper. I'm not very familiar with this area. I didn't spend too much time trying to properly evaluate the significance of the findings.
[21-10-14] [paper163]
  • Learning to Simulate Complex Physics with Graph Networks [pdf] [code] [annotated pdf]
  • Alvaro Sanchez-Gonzalez, Jonathan Godwin, Tobias Pfaff, Rex Ying, Jure Leskovec, Peter W. Battaglia
  • 2020-02-21, ICML 2020
Quite well-written and somewhat interesting paper. Cool application and a bunch of neat videos. This is not really my area, so I didn't spend too much time/energy trying to fully understand everything.
[21-10-12] [paper162]
Interesting and very well-written paper, I really enjoyed reading it! The paper also gives a good understanding of neural implicit representations in general.
[21-10-08] [paper160]
Well-written and farily interesting paper. I read it mainly as background for "Hierarchical Kinematic Probability Distributions for 3D Human Shape and Pose Estimation from Images in the Wild" which is written by exactly the same authors. In this paper, they just use direct regression. They also use silhouettes + 2D keypoint heatmaps as input (instead of edge-filters + 2D keypoint heatmaps).
[21-10-04] [paper156]
Well-written and fairly interesting paper. The marker-based representation, instead of using skeleton joints, makes sense. The recursive projection scheme also makes sense, but seems very slow (2.27 sec/frame)? I didn't quite get all the details for their DCT representation of the latent space.
[21-10-03] [paper154]
Well-written and interesting paper, I enjoyed reading it. Neat application of implicit representations. The paper also gives a quite good overview of online 3D reconstruction in general.
[21-10-01] [paper152]
Well-written and quite interesting paper. Interesting application, being able to reconstruct full 3D scenes from sparse point clouds. I didn't fully understand everything, as I don't have a particularly strong graphics background.
[21-09-15] [paper146]
Extremely well-written and interesting paper. I really enjoyed reading it, and I would recommend anyone interested in computer vision to read it as well.

All parts of the proposed method are clearly explained and relatively easy to understand, including the volume rendering techniques which I was unfamiliar with.
[21-06-18] [paper138]
[21-06-17] [paper137]
[21-06-14] [paper132]
  • 3D Multi-bodies: Fitting Sets of Plausible 3D Human Models to Ambiguous Image Data [pdf] [annotated pdf]
  • Benjamin Biggs, Sébastien Ehrhadt, Hanbyul Joo, Benjamin Graham, Andrea Vedaldi, David Novotny
  • 2020-11-02, NeurIPS 2020
  • [3D Human Pose Estimation]
[21-04-16] [paper128]
  • Learning Mesh-Based Simulation with Graph Networks [pdf] [code] [annotated pdf]
  • Tobias Pfaff, Meire Fortunato, Alvaro Sanchez-Gonzalez, Peter W. Battaglia
  • 2020-10-07, ICLR 2021
[21-04-09] [paper127]
[21-03-26] [paper125]
  • Your GAN is Secretly an Energy-based Model and You Should use Discriminator Driven Latent Sampling [pdf] [pdf with comments]
  • Tong Che, Ruixiang Zhang, Jascha Sohl-Dickstein, Hugo Larochelle, Liam Paull, Yuan Cao, Yoshua Bengio
  • 2020-03-12, NeurIPS 2020
  • [Energy-Based Models]
[21-03-12] [paper123]
  • Unsupervised Learning of Visual Features by Contrasting Cluster Assignments [pdf] [code] [pdf with comments]
  • Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, Armand Joulin
  • 2020-06-17, NeurIPS 2020
[21-02-05] [paper118]
[21-01-29] [paper117]
  • No MCMC for Me: Amortized Sampling for Fast and Stable Training of Energy-Based Models [pdf] [code] [pdf with comments]
  • Will Grathwohl, Jacob Kelly, Milad Hashemi, Mohammad Norouzi, Kevin Swersky, David Duvenaud
  • 2020-10-08, ICLR 2021
  • [Energy-Based Models]
[21-01-22] [paper116]
[21-01-15] [paper115]
  • Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention [pdf] [pdf with comments]
  • Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, François Fleuret
  • 2020-06-29, ICML 2020
  • [Transformers]
[20-12-18] [paper114]
  • Score-Based Generative Modeling through Stochastic Differential Equations [pdf] [code] [pdf with comments]
  • Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, Ben Poole
  • 2020-11-26, ICLR 2021
  • [Neural ODEs]
[20-12-14] [paper113]
  • Dissecting Neural ODEs [pdf] [pdf with comments]
  • Stefano Massaroli, Michael Poli, Jinkyoo Park, Atsushi Yamashita, Hajime Asama
  • 2020-02-19, NeurIPS 2020
[20-11-27] [paper112]
  • Rethinking Attention with Performers [pdf] [pdf with comments]
  • Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, Afroz Mohiuddin, Lukasz Kaiser, David Belanger, Lucy Colwell, Adrian Weller
  • 2020-10-30, ICLR 2021
  • [Transformers]
[20-11-23] [paper111]
[20-11-13] [paper110]
[20-10-16] [paper108]
[20-09-24] [paper106]
  • Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness [pdf] [[pdf with comments]](https://github.com/