/ICML-2019

A summary of research work presented in the thirty-sixth International Conference on Machine Learning (ICML) @ Long beach - 2019

Primary LanguagePython

ICML-2019

A summary of research work presented in the thirty-sixth International Conference on Machine Learning (ICML) @ Long beach - 2019

Tutorials

  1. Never Ending Learning [Web]
  2. Active learning from theory to practice [Web]
  3. Active Hypothesis Testing: An Information Theoretic (re)View [Web]
  4. Recent Advances in Population-Based Search for Deep Neural Networks: Quality Diversity, Indirect Encodings, and Open-Ended Algorithms [Web]
  5. Meta-Learning: from Few-Shot Learning to Rapid Reinforcement Learning [Web]
  6. A Tutorial on Attention in Deep Learning [Web]
  7. Causal Inference and Stable Learning [Web]
  8. A Primer on PAC-Bayesian Learning [Web]
  9. Neural Approaches to Conversational AI [Web]

Multi Track Research

Day 1

  1. Opening Remarks [Web]
  2. Machine learning for robots to think fast [Web]
  3. Best Paper [Web]
  4. Adversarial Attacks on Node Embeddings via Graph Poisoning [Web] [Slides]
  5. SelectiveNet: A Deep Neural Network with an Integrated Reject Option [Web]
  6. ELF OpenGo: an analysis and open reimplementation of AlphaZero [Web]
  7. A Contrastive Divergence for Combining Variational Inference and MCMC [Web] [Slides]
  8. Regret Circuits: Composability of Regret Minimizers [Web] [Slides]
  9. Refined Complexity of PCA with Outliers [Web]
  10. PA-GD: On the Convergence of Perturbed Alternating Gradient Descent to Second-Order Stationary Points for Structured Nonconvex Optimization [Web] [Slides]
  11. Validating Causal Inference Models via Influence Functions [Web]
  12. Data Shapley: Equitable Valuation of Data for Machine Learning [Web]
  13. First-Order Adversarial Vulnerability of Neural Networks and Input Dimension [Web] [Slides]
  14. Manifold Mixup: Better Representations by Interpolating Hidden States [Web] [Slides]
  15. Making Deep Q-learning methods robust to time discretization [Web] [Slides]
  16. Calibrated Approximate Bayesian Inference [Web] [Slides]
  17. Game Theoretic Optimization via Gradient-based Nikaido-Isoda Function [Web] [Slides]
  18. On Efficient Optimal Transport: An Analysis of Greedy and Accelerated Mirror Descent Algorithms [Web] [Slides]
  19. Improved Zeroth-Order Variance Reduced Algorithms and Analysis for Nonconvex Optimization [Web] [Slides]
  20. Ithemal: Accurate, Portable and Fast Basic Block Throughput Estimation using Deep Neural Networks [Web] [Slides]
  21. Feature Grouping as a Stochastic Regularizer for High-Dimensional Structured Data [Web] [Slides]
  22. On Certifying Non-Uniform Bounds against Adversarial Attacks [Web] [Slides]
  23. Processing Megapixel Images with Deep Attention-Sampling Models [Web] [Slides]
  24. Nonlinear Distributional Gradient Temporal-Difference Learning [Web] [Slides]
  25. Moment-Based Variational Inference for Markov Jump Processes [Web] [Slides]
  26. Stable-Predictive Optimistic Counterfactual Regret Minimization [Web] [Slides]
  27. Passed & Spurious: Descent Algorithms and Local Minima in Spiked Matrix-Tensor Models [Web] [Slides]
  28. Faster Stochastic Alternating Direction Method of Multipliers for Nonconvex Optimization [Web] [Slides]
  29. Learning to Groove with Inverse Sequence Transformations [Web] [Slides]
  30. Metric-Optimized Example Weights [Web] [Slides]
  31. Improving Adversarial Robustness via Promoting Ensemble Diversity [Web] [Slides]
  32. TapNet: Neural Network Augmented with Task-Adaptive Projection for Few-Shot Learning [Web] [Slides]
  33. Composing Entropic Policies using Divergence Correction [Web] [Slides]
  34. Understanding MCMC Dynamics as Flows on the Wasserstein Space [Web] [Slides]
  35. When Samples Are Strategically Selected [Web] [Slides]
  36. Teaching a black-box learner [Web] [Slides]
  37. Lower Bounds for Smooth Nonconvex Finite-Sum Optimization [Web] [Slides]
  38. Grid-Wise Control for Multi-Agent Reinforcement Learning in Video Game AI [Web] [Slides]
  39. Improving Model Selection by Employing the Test Data [Web] [Slides]
  40. Adversarial camera stickers: A physical camera-based attack on deep learning systems [Web] [Slides]
  41. Online Meta-Learning [Web] [Slides]
  42. TibGM: A Transferable and Information-Based Graphical Model Approach for Reinforcement Learning [Web] [Slides]
  43. LR-GLM: High-Dimensional Bayesian Inference Using Low-Rank Data Approximations [Web] [Slides]
  44. Statistical Foundations of Virtual Democracy [Web] [Slides]
  45. PAC Learnability of Node Functions in Networked Dynamical Systems [Web] [Slides]
  46. Nonconvex Variance Reduced Optimization with Arbitrary Sampling [Web] [Slides]
  47. HOList: An Environment for Machine Learning of Higher Order Logic Theorem Proving [Web] [Slides]
  48. Topological Data Analysis of Decision Boundaries with Application to Model Selection [Web] [Slides]
  49. Adversarial examples from computational constraints [Web]
  50. Training Neural Networks with Local Error Signals [Web] [Slides]
  51. Multi-Agent Adversarial Inverse Reinforcement Learning [Web] [Slides]
  52. Amortized Monte Carlo Integration [Web] [Slides]
  53. Optimal Auctions through Deep Learning [Web] [Slides]
  54. Online learning with kernel losses [Web] [Slides]
  55. Error Feedback Fixes SignSGD and other Gradient Compression Schemes [Web]
  56. Molecular Hypergraph Grammar with Its Application to Molecular Optimization [Web] [Slides]
  57. Contextual Memory Trees [Web]
  58. POPQORN: Quantifying Robustness of Recurrent Neural Networks [Web] [Slides]
  59. GMNN: Graph Markov Neural Networks [Web] [Slides]
  60. Policy Consolidation for Continual Reinforcement Learning [Web] [Slides]
  61. Stein Point Markov Chain Monte Carlo [Web] [Slides]
  62. Learning to Clear the Market [Web] [Slides]
  63. Nearest Neighbor and Kernel Survival Analysis: Nonasymptotic Error Bounds and Strong Consistency Rates [Web] [Slides]
  64. A Composite Randomized Incremental Gradient Method [Web] [Slides]
  65. Graph Neural Network for Music Score Data and Modeling Expressive Piano Performance [Web] [Slides]
  66. Sparse Extreme Multi-label Learning with Oracle Property [Web] [Slides]
  67. Using Pre-Training Can Improve Model Robustness and Uncertainty [Web] [Slides]
  68. Self-Attention Graph Pooling [Web] [Slides]
  69. Off-Policy Deep Reinforcement Learning without Exploration [Web] [Slides]
  70. Fast and Simple Natural-Gradient Variational Inference with Mixture of Exponential-family Approximations [Web] [Slides]
  71. Learning to bid in revenue-maximizing auctions [Web] [Slides]
  72. Fast Rates for a kNN Classifier Robust to Unknown Asymmetric Label Noise [Web] [Slides]
  73. Optimal Continuous DR-Submodular Maximization and Applications to Provable Mean Field Inference [Web] [Slides]
  74. Learning to Prove Theorems via Interacting with Proof Assistants [Web] [Slides]
  75. Shape Constraints for Set Functions [Web] [Slides]
  76. [Web]
  77. Combating Label Noise in Deep Learning using Abstention [Web] [Slides]
  78. Random Expert Distillation: Imitation Learning via Expert Policy Support Estimation [Web] [Slides]
  79. Particle Flow Bayes' Rule [Web] [Slides]
  80. Open-ended learning in symmetric zero-sum games [Web] [Slides]
  81. Uniform Convergence Rate of the Kernel Density Estimator Adaptive to Intrinsic Volume Dimension [Web] [Slides]
  82. Multiplicative Weights Updates as a distributed constrained optimization algorithm: Convergence to second-order stationary points almost always [Web] [Slides]
  83. Circuit-GNN: Graph Neural Networks for Distributed Circuit Design [Web] [Slides]
  84. On The Power of Curriculum Learning in Training Deep Networks [Web] [Slides]
  85. PROVEN: Verifying Robustness of Neural Networks with a Probabilistic Approach [Web] [Slides]
  86. LGM-Net: Learning to Generate Matching Networks for Few-Shot Learning [Web] [Slides]
  87. Revisiting the Softmax Bellman Operator: New Benefits and New Perspective [Web] [Slides]
  88. Correlated Variational Auto-Encoders [Web] [Slides]
  89. Deep Counterfactual Regret Minimization [Web] [Slides]
  90. Maximum Likelihood Estimation for Learning Populations of Parameters [Web] [Slides]
  91. Katalyst: Boosting Convex Katayusha for Non-Convex Problems with a Large Condition Number [Web] [Slides]
  92. Learning to Optimize Multigrid PDE Solvers [Web] [Slides]
  93. Voronoi Boundary Classification: A High-Dimensional Geometric Approach via Weighted Monte Carlo Integration [Web] [Slides]
  94. On Learning Invariant Representations for Domain Adaptation [Web]
  95. Self-Attention Generative Adversarial Networks [Web]
  96. An Investigation of Model-Free Planning [Web]
  97. Towards a Unified Analysis of Random Fourier Features [Web]
  98. Generalized Approximate Survey Propagation for High-Dimensional Estimation [Web] [Slides]
  99. Projection onto Minkowski Sums with Application to Constrained Learning [Web] [Slides]
  100. Safe Policy Improvement with Baseline Bootstrapping [Web] [Slides]
  101. A Block Coordinate Descent Proximal Method for Simultaneous Filtering and Parameter Estimation [Web]
  102. Robust Decision Trees Against Adversarial Examples [Web] [Slides]
  103. Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep Models [Web] [Slides]
  104. Multivariate-Information Adversarial Ensemble for Scalable Joint Distribution Matching [Web] [Slides]
  105. CURIOUS: Intrinsically Motivated Modular Multi-Goal Reinforcement Learning [Web] [Slides]
  106. Learning deep kernels for exponential family densities [Web] [Slides]
  107. Boosted Density Estimation Remastered [Web] [Slides]
  108. Blended Conditonal Gradients [Web] [Slides]
  109. Distributional Reinforcement Learning for Efficient Exploration [Web] [Slides]
  110. Learning Hawkes Processes Under Synchronization Noise [Web] [Slides]
  111. Automatic Classifiers as Scientific Instruments: One Step Further Away from Ground-Truth [Web] [Slides]
  112. Adversarial Generation of Time-Frequency Features with application in audio synthesis [Web] [Slides]
  113. High-Fidelity Image Generation With Fewer Labels [Web] [Slides]
  114. Task-Agnostic Dynamics Priors for Deep Reinforcement Learning [Web] [Slides]
  115. Bayesian Deconditional Kernel Mean Embeddings [Web] [Slides]
  116. Inference and Sampling of $K_{33}$-free Ising Models [Web] [Slides]
  117. Acceleration of SVRG and Katyusha X by Inexact Preconditioning [Web] [Slides]
  118. Optimistic Policy Optimization via Multiple Importance Sampling [Web] [Slides]
  119. Generative Adversarial User Model for Reinforcement Learning Based Recommendation System [Web] [Slides]
  120. Look Ma, No Latent Variables: Accurate Cutset Networks via Compilation [Web] [Slides]
  121. On the Universality of Invariant Networks [Web] [Slides]
  122. Revisiting precision recall definition for generative modeling [Web] [Slides]
  123. Diagnosing Bottlenecks in Deep Q-learning Algorithms [Web] [Slides]
  124. A Kernel Perspective for Regularizing Deep Neural Networks [Web] [Slides]
  125. Random Matrix Improved Covariance Estimation for a Large Class of Metrics [Web] [Slides]
  126. Characterization of Convex Objective Functions and Optimal Expected Convergence Rates for SGD [Web] [Slides]
  127. Neural Logic Reinforcement Learning [Web] [Slides]
  128. A Statistical Investigation of Long Memory in Language and Music [Web] [Slides]
  129. Optimal Transport for structured data with application on graphs [Web] [Slides]
  130. Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks [Web] [Slides]
  131. Wasserstein of Wasserstein Loss for Learning Generative Models [Web] [Slides]
  132. Collaborative Evolutionary Reinforcement Learning [Web] [Slides]
  133. A Persistent Weisfeiler--Lehman Procedure for Graph Classification [Web] [Slides]
  134. Dual Entangled Polynomial Code: Three-Dimensional Coding for Distributed Matrix Multiplication [Web] [Slides]
  135. A Conditional-Gradient-Based Augmented Lagrangian Framework [Web] [Slides]
  136. Learning to Collaborate in Markov Decision Processes [Web] [Slides]
  137. Deep Factors for Forecasting [Web] [Slides]
  138. Learning Optimal Linear Regularizers [Web] [Slides]
  139. Gauge Equivariant Convolutional Networks and the Icosahedral CNN [Web]
  140. Flat Metric Minimization with Applications in Generative Modeling [Web] [Slides]
  141. EMI: Exploration with Mutual Information [Web]
  142. Rehashing Kernel Evaluation in High Dimensions [Web] [Slides]
  143. Neural Joint Source-Channel Coding [Web] [Slides]
  144. SGD: General Analysis and Improved Rates [Web]
  145. Predictor-Corrector Policy Optimization [Web] [Slides]
  146. Weakly-Supervised Temporal Localization via Occurrence Count Learning [Web] [Slides]
  147. On Symmetric Losses for Learning from Corrupted Labels [Web] [Slides]
  148. Feature-Critic Networks for Heterogeneous Domain Generalization [Web] [Slides]
  149. Entropic GANs meet VAEs: A Statistical Approach to Compute Sample Likelihoods in GANs [Web] [Slides]
  150. Imitation Learning from Imperfect Demonstration [Web] [Slides]
  151. Large-Scale Sparse Kernel Canonical Correlation Analysis [Web] [Slides][Slides]
  152. Doubly-Competitive Distribution Estimation [Web] [Slides]
  153. Curvature-Exploiting Acceleration of Elastic Net Computations [Web] [Slides]
  154. Learning a Prior over Intent via Meta-Inverse Reinforcement Learning [Web] [Slides]
  155. Switching Linear Dynamics for Variational Bayes Filtering [Web] [Slides]
  156. AUCµ: A Performance Metric for Multi-Class Machine Learning Models [Web] [Slides]
  157. Learning to Convolve: A Generalized Weight-Tying Approach [Web] [Slides]
  158. Non-Parametric Priors For Generative Adversarial Networks [Web] [Slides]
  159. Curiosity-Bottleneck: Exploration By Distilling Task-Specific Novelty [Web] [Slides]
  160. A Kernel Theory of Modern Data Augmentation [Web] [Slides]
  161. Homomorphic Sensing [Web] [Slides]
  162. Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication [Web] [Slides]
  163. DeepMDP: Learning Continuous Latent Space Models for Representation Learning [Web] [Slides]
  164. Imputing Missing Events in Continuous-Time Event Streams [Web] [Slides]
  165. Regularization in directable environments with application to Tetris [Web] [Slides]
  166. On Dropout and Nuclear Norm Regularization [Web] [Slides]
  167. Lipschitz Generative Adversarial Nets [Web] [Slides]
  168. Dynamic Weights in Multi-Objective Deep Reinforcement Learning [Web] [Slides]
  169. kernelPSI: a Post-Selection Inference Framework for Nonlinear Variable Selection [Web] [Slides]
  170. Phaseless PCA: Low-Rank Matrix Recovery from Column-wise Phaseless Measurements [Web] [Slides]
  171. Safe Grid Search with Optimal Complexity [Web] [Slides]
  172. Importance Sampling Policy Evaluation with an Estimated Behavior Policy [Web] [Slides]
  173. Understanding and Controlling Memory in Recurrent Neural Networks [Web] [Slides]
  174. Improved Dynamic Graph Learning through Fault-Tolerant Sparsification [Web] [Slides]
  175. Gradient Descent Finds Global Minima of Deep Neural Networks [Web] [Slides]
  176. HexaGAN: Generative Adversarial Nets for Real World Classification [Web] [Slides]
  177. Fingerprint Policy Optimisation for Robust Reinforcement Learning [Web] [Slides]
  178. Scalable Learning in Reproducing Kernel Krein Spaces [Web] [Slides]
  179. Rate Distortion For Model Compression:From Theory To Practice [Web] [Slides]
  180. SAGA with Arbitrary Sampling [Web] [Slides]
  181. Learning from a Learner [Web] [Slides]
  182. Recurrent Kalman Networks: Factorized Inference in High-Dimensional Deep Feature Spaces [Web] [Slides]
  183. Heterogeneous Model Reuse via Optimizing Multiparty Multiclass Margin [Web] [Slides]
  184. Composable Core-sets for Determinant Maximization: A Simple Near-Optimal Algorithm [Web] [Slides]
  185. Graph Matching Networks for Learning the Similarity of Graph Structured Objects [Web] [Slides]
  186. An Investigation into Neural Net Optimization via Hessian Eigenvalue Density [Web] [Slides]
  187. Dirichlet Simplex Nest and Geometric Inference [Web] [Slides]
  188. Formal Privacy for Functional Data with Gaussian Perturbations [Web] [Slides]
  189. Natural Analysts in Adaptive Data Analysis [Web] [Slides]
  190. Separable value functions across time-scales [Web]
  191. Subspace Robust Wasserstein Distances [Web]
  192. Rethinking Lossy Compression: The Rate-Distortion-Perception Tradeoff [Web]
  193. Sublinear Time Nearest Neighbor Search over Generalized Weighted Space [Web] [Slides]
  194. BayesNAS: A Bayesian Approach for Neural Architecture Search [Web] [Slides]
  195. Differentiable Linearized ADMM [Web] [Slides]
  196. Bayesian leave-one-out cross-validation for large data [Web] [Slides]
  197. Graphical-model based estimation and inference for differential privacy [Web] [Slides]
  198. CapsAndRuns: An Improved Method for Approximately Optimal Algorithm Configuration [Web] [Slides]
  199. Learning Action Representations for Reinforcement Learning [Web] [Slides]
  200. Decomposing feature-level variation with Covariate Gaussian Process Latent Variable Models [Web] [Slides]
  201. Collaborative Channel Pruning for Deep Networks [Web] [Slides]
  202. Compressing Gradient Optimizers via Count-Sketches [Web] [Slides]
  203. Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks [Web] [Slides]
  204. Adaptive Stochastic Natural Gradient Method for One-Shot Neural Architecture Search [Web] [Slides]
  205. Rao-Blackwellized Stochastic Gradients for Discrete Distributions [Web] [Slides]
  206. White-box vs Black-box: Bayes Optimal Strategies for Membership Inference [Web] [Slides]
  207. Leveraging Low-Rank Relations Between Surrogate Tasks in Structured Prediction [Web] [Slides]
  208. Bayesian Counterfactual Risk Minimization [Web] [Slides]
  209. Active Manifolds: A non-linear analogue to Active Subspaces [Web] [Slides]
  210. Same, Same But Different: Recovering Neural Network Quantization Error Through Weight Factorization [Web] [Slides]
  211. Scalable Fair Clustering [Web] [Slides]
  212. Shallow-Deep Networks: Understanding and Mitigating Network Overthinking [Web] [Slides]
  213. A Quantitative Analysis of the Effect of Batch Normalization on Gradient Descent [Web] [Slides]
  214. Neurally-Guided Structure Inference [Web] [Slides]
  215. An Optimal Private Stochastic-MAB Algorithm based on Optimal Private Stopping Rule [Web] [Slides]
  216. Training Well-Generalizing Classifiers for Fairness Metrics and Other Data-Dependent Constraints [Web] [Slides]
  217. Per-Decision Option Discounting [Web] [Slides]
  218. Optimal Minimal Margin Maximization with Boosting [Web] [Slides]
  219. GDPP: Learning Diverse Generations using Determinantal Point Processes [Web] [Slides]
  220. Conditional Gradient Methods via Stochastic Path-Integrated Differential Estimator [Web] [Slides]
  221. Graph U-Nets [Web] [Slides]
  222. The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study [Web] [Slides]
  223. Bayesian Joint Spike-and-Slab Graphical Lasso [Web] [Slides]
  224. Sublinear Space Private Algorithms Under the Sliding Window Model [Web] [Slides]
  225. Optimality Implies Kernel Sum Classifiers are Statistically Efficient [Web] [Slides]
  226. Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds [Web] [Slides]
  227. Generalized Linear Rule Models [Web] [Slides]
  228. Co-Representation Network for Generalized Zero-Shot Learning [Web] [Slides]
  229. Fault Tolerance in Iterative-Convergent Machine Learning [Web]
  230. SATNet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver [Web] [Slides]
  231. AdaGrad stepsizes: sharp convergence over nonconvex landscapes [Web] [Slides]
  232. Rotation Invariant Householder Parameterization for Bayesian PCA [Web] [Slides]
  233. Locally Private Bayesian Inference for Count Models [Web]
  234. The Implicit Fairness Criterion of Unconstrained Learning [Web]
  235. A Theory of Regularized Markov Decision Processes [Web] [Slides]
  236. Fast Incremental von Neumann Graph Entropy Computation: Theory, Algorithm, and Applications [Web] [Slides]
  237. GEOMetrics: Exploiting Geometric Structure for Graph-Encoded Objects [Web] [Slides]
  238. Static Automatic Batching In TensorFlow [Web] [Slides]
  239. Area Attention [Web] [Slides]
  240. Beyond Backprop: Online Alternating Minimization with Auxiliary Variables [Web] [Slides]
  241. A Framework for Bayesian Optimization in Embedded Subspaces [Web] [Slides]
  242. Low Latency Privacy Preserving Inference [Web] [Slides]
  243. Weak Detection of Signal in the Spiked Wigner Model [Web] [Slides]
  244. Discovering Options for Exploration by Minimizing Cover Time [Web] [Slides]
  245. Variational Inference for sparse network reconstruction from count data [Web] [Slides]
  246. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks [Web] [Slides]
  247. Improving Neural Network Quantization without Retraining using Outlier Channel Splitting [Web] [Slides]
  248. The Evolved Transformer [Web] [Slides]
  249. SWALP : Stochastic Weight Averaging in Low Precision Training [Web] [Slides]
  250. Convolutional Poisson Gamma Belief Network [Web] [Slides]
  251. Communication Complexity in Locally Private Distribution Estimation and Heavy Hitters [Web] [Slides]
  252. Rademacher Complexity for Adversarially Robust Generalization [Web] [Slides]
  253. Policy Certificates: Towards Accountable Reinforcement Learning [Web] [Slides]
  254. Simplifying Graph Convolutional Networks [Web] [Slides]
  255. Geometry Aware Convolutional Filters for Omnidirectional Images Representation [Web] [Slides]
  256. Memory-Optimal Direct Convolutions for Maximizing Classification Accuracy in Embedded Applications [Web] [Slides]
  257. Jumpout : Improved Dropout for Deep Neural Networks with ReLUs [Web] [Slides]
  258. Efficient optimization of loops and limits with randomized telescoping sums [Web] [Slides]
  259. Automatic Posterior Transformation for Likelihood-Free Inference [Web] [Slides]
  260. Poission Subsampled R'enyi Differential Privacy [Web] [Slides]
  261. Provably efficient RL with Rich Observations via Latent State Decoding [Web] [Slides]
  262. Action Robust Reinforcement Learning and Applications in Continuous Control [Web] [Slides]
  263. Robust Influence Maximization for Hyperparametric Models [Web] [Slides]
  264. A Personalized Affective Memory Model for Improving Emotion Recognition [Web] [Slides]
  265. DL2: Training and Querying Neural Networks with Logic [Web] [Slides]
  266. Stochastic Deep Networks [Web] [Slides]
  267. Self-similar Epochs: Value in arrangement [Web] [Slides]
  268. Active Learning for Decision-Making from Imbalanced Observational Data [Web] [Slides]
  269. Benefits and Pitfalls of the Exponential Mechanism with Applications to Hilbert Spaces and Functional PCA [Web] [Slides]
  270. Information-Theoretic Considerations in Batch Reinforcement Learning [Web] [Slides]
  271. The Value Function Polytope in Reinforcement Learning [Web] [Slides]
  272. HyperGAN: A Generative Model for Diverse, Performant Neural Networks [Web] [Slides]
  273. Temporal Gaussian Mixture Layer for Videos [Web] [Slides]
  274. Posters Tue [Web]

Day 2

  1. The U.S. Census Bureau Tries to be a Good Data Steward in the 21st Century [Web]
  2. Test of Time Award [Web]
  3. Theoretically Principled Trade-off between Robustness and Accuracy [Web]
  4. Sum-of-Squares Polynomial Flow [Web]
  5. Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning [Web]
  6. Distribution calibration for regression [Web]
  7. On the Convergence and Robustness of Adversarial Training [Web] [Slides]
  8. Distributed Learning with Sublinear Communication [Web]
  9. Complexity of Linear Regions in Deep Networks [Web]
  10. Exploiting Worker Correlation for Label Aggregation in Crowdsourcing [Web]
  11. Optimal Algorithms for Lipschitz Bandits with Heavy-tailed Rewards [Web]
  12. The Odds are Odd: A Statistical Test for Detecting Adversarial Examples [Web] [Slides]
  13. FloWaveNet : A Generative Flow for Raw Audio [Web] [Slides]
  14. Maximum Entropy-Regularized Multi-Goal Reinforcement Learning [Web] [Slides]
  15. Graph Convolutional Gaussian Processes [Web] [Slides]
  16. Learning with Bad Training Data via Iterative Trimmed Loss Minimization [Web] [Slides]
  17. On the Linear Speedup Analysis of Communication Efficient Momentum SGD for Distributed Non-Convex Optimization [Web] [Slides]
  18. On Connected Sublevel Sets in Deep Learning [Web] [Slides]
  19. Efficient Amortised Bayesian Inference for Hierarchical and Nonlinear Dynamical Systems [Web] [Slides]
  20. Target Tracking for Contextual Bandits: Application to Demand Side Management [Web] [Slides][Slides]
  21. ME-Net: Towards Effective Adversarial Robustness with Matrix Estimation [Web] [Slides]
  22. Are Generative Classifiers More Robust to Adversarial Attacks? [Web] [Slides]
  23. Imitating Latent Policies from Observation [Web] [Slides]
  24. Asynchronous Batch Bayesian Optimisation with Improved Local Penalisation [Web] [Slides]
  25. On discriminative learning of prediction uncertainty [Web] [Slides]
  26. Stochastic Gradient Push for Distributed Deep Learning [Web] [Slides]
  27. Adversarial Examples Are a Natural Consequence of Test Error in Noise [Web] [Slides]
  28. A Multitask Multiple Kernel Learning Algorithm for Survival Analysis with Application to Cancer Biology [Web] [Slides]
  29. Correlated bandits or: How to minimize mean-squared error online [Web] [Slides]
  30. Certified Adversarial Robustness via Randomized Smoothing [Web] [Slides]
  31. A Gradual, Semi-Discrete Approach to Generative Network Training via Explicit Wasserstein Minimization [Web] [Slides]
  32. SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning [Web] [Slides]
  33. GOODE: A Gaussian Off-The-Shelf Ordinary Differential Equation Solver [Web] [Slides]
  34. Understanding and Utilizing Deep Neural Networks Trained with Noisy Labels [Web] [Slides]
  35. Collective Model Fusion for Multiple Black-Box Experts [Web] [Slides]
  36. Greedy Layerwise Learning Can Scale To ImageNet [Web] [Slides]
  37. Fast and Flexible Inference of Joint Distributions from their Marginals [Web] [Slides]
  38. Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging [Web] [Slides]
  39. Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition [Web] [Slides]
  40. Disentangling Disentanglement in Variational Autoencoders [Web] [Slides]
  41. Dimension-Wise Importance Sampling Weight Clipping for Sample-Efficient Reinforcement Learning [Web] [Slides]
  42. Overcoming Mean-Field Approximations in Recurrent Gaussian Process Models [Web] [Slides]
  43. Does Data Augmentation Lead to Positive Margin? [Web] [Slides]
  44. Trading Redundancy for Communication: Speeding up Distributed SGD for Non-convex Optimization [Web] [Slides]
  45. On the Impact of the Activation function on Deep Neural Networks Training [Web] [Slides]
  46. Cognitive model priors for predicting human decisions [Web] [Slides]
  47. Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits [Web] [Slides]
  48. Parsimonious Black-Box Adversarial Attacks via Efficient Combinatorial Optimization [Web]
  49. EDDI: Efficient Dynamic Discovery of High-Value Information with Partial VAE [Web]
  50. Structured agents for physical construction [Web]
  51. AReS and MaRS - Adversarial and MMD-Minimizing Regression for SDEs [Web]
  52. Robust Learning from Untrusted Sources [Web] [Slides]
  53. Trimming the $\ell_1$ Regularizer: Statistical Analysis, Optimization, and Applications to Deep Learning [Web] [Slides]
  54. Estimating Information Flow in Deep Neural Networks [Web] [Slides]
  55. Conditioning by adaptive sampling for robust design [Web] [Slides]
  56. Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously [Web] [Slides]
  57. Wasserstein Adversarial Examples via Projected Sinkhorn Iterations [Web] [Slides]
  58. A Wrapped Normal Distribution on Hyperbolic Space for Gradient-Based Learning [Web] [Slides]
  59. Learning Novel Policies For Tasks [Web] [Slides]
  60. End-to-End Probabilistic Inference for Nonstationary Audio Analysis [Web] [Slides]
  61. SELFIE: Refurbishing Unclean Samples for Robust Deep Learning [Web] [Slides]
  62. Compressed Factorization: Fast and Accurate Low-Rank Factorization of Compressively-Sensed Data [Web] [Slides]
  63. The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects [Web] [Slides]
  64. Direct Uncertainty Prediction for Medical Second Opinions [Web] [Slides]
  65. Bilinear Bandits with Low-rank Structure [Web] [Slides]
  66. Transferable Clean-Label Poisoning Attacks on Deep Neural Nets [Web] [Slides]
  67. Emerging Convolutions for Generative Normalizing Flows [Web] [Slides]
  68. Taming MAML: Efficient unbiased meta-reinforcement learning [Web] [Slides]
  69. Deep Gaussian Processes with Importance-Weighted Variational Inference [Web] [Slides]
  70. Zeno: Distributed Stochastic Gradient Descent with Suspicion-based Fault-tolerance [Web] [Slides]
  71. Noisy Dual Principal Component Pursuit [Web] [Slides]
  72. Characterizing Well-Behaved vs. Pathological Deep Neural Networks [Web] [Slides]
  73. Dynamic Measurement Scheduling for Event Forecasting using Deep RL [Web] [Slides]
  74. Online Learning to Rank with Features [Web] [Slides]
  75. NATTACK: Learning the Distributions of Adversarial Examples for an Improved Black-Box Attack on Deep Neural Networks [Web] [Slides]
  76. A Large-Scale Study on Regularization and Normalization in GANs [Web] [Slides]
  77. Self-Supervised Exploration via Disagreement [Web] [Slides]
  78. Automated Model Selection with Bayesian Quadrature [Web] [Slides]
  79. Concentration Inequalities for Conditional Value at Risk [Web] [Slides]
  80. Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling [Web] [Slides]
  81. Understanding Geometry of Encoder-Decoder CNNs [Web] [Slides]
  82. Parameter efficient training of deep convolutional neural networks by dynamic sparse reparameterization [Web] [Slides]
  83. On the Design of Estimators for Bandit Off-Policy Evaluation [Web] [Slides]
  84. Simple Black-box Adversarial Attacks [Web] [Slides]
  85. Variational Annealing of GANs: A Langevin Perspective [Web] [Slides]
  86. Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables [Web] [Slides]
  87. [Web]
  88. Data Poisoning Attacks in Multi-Party Learning [Web] [Slides]
  89. Screening rules for Lasso with non-convex Sparse Regularizers [Web] [Slides]
  90. Traditional and Heavy Tailed Self Regularization in Neural Network Models [Web] [Slides]
  91. DeepNose: Using artificial neural networks to represent the space of odorants [Web] [Slides]
  92. Dynamic Learning with Frequent New Product Launches: A Sequential Multinomial Logit Bandit Problem [Web] [Slides]
  93. Causal Identification under Markov Equivalence: Completeness Results [Web]
  94. Invertible Residual Networks [Web]
  95. The Natural Language of Actions [Web] [Slides]
  96. Beyond the Chinese Restaurant and Pitman-Yor processes: Statistical Models with double power-law behavior [Web]
  97. Distributed Weighted Matching via Randomized Composable Coresets [Web] [Slides]
  98. Monge blunts Bayes: Hardness Results for Adversarial Training [Web] [Slides]
  99. Almost surely constrained convex optimization [Web] [Slides]
  100. Domain Agnostic Learning with Disentangled Representations [Web]
  101. Context-Aware Zero-Shot Learning for Object Recognition [Web]
  102. Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models [Web] [Slides]
  103. NAS-Bench-101: Towards Reproducible Neural Architecture Search [Web] [Slides]
  104. Control Regularization for Reduced Variance Reinforcement Learning [Web] [Slides]
  105. DP-GP-LVM: A Bayesian Non-Parametric Model for Learning Multivariate Dependency Structures [Web] [Slides]
  106. Multivariate Submodular Optimization [Web] [Slides]
  107. Better generalization with less data using robust gradient descent [Web] [Slides]
  108. Generalized Majorization-Minimization [Web] [Slides]
  109. Composing Value Functions in Reinforcement Learning [Web] [Slides][Slides]
  110. Band-limited Training and Inference for Convolutional Neural Networks [Web] [Slides]
  111. Causal Discovery and Forecasting in Nonstationary Environments with State-Space Models [Web] [Slides]
  112. Approximated Oracle Filter Pruning for Destructive CNN Width Optimization [Web] [Slides]
  113. On the Generalization Gap in Reparameterizable Reinforcement Learning [Web] [Slides]
  114. Random Function Priors for Correlation Modeling [Web] [Slides]
  115. Beyond Adaptive Submodularity: Approximation Guarantees of Greedy Policy with Adaptive Submodularity Ratio [Web] [Slides]
  116. Near optimal finite time identification of arbitrary linear dynamical systems [Web] [Slides]
  117. On the Computation and Communication Complexity of Parallel SGD with Dynamic Batch Sizes for Stochastic Non-Convex Optimization [Web] [Slides]
  118. Fast Context Adaptation via Meta-Learning [Web] [Slides]
  119. Learning Classifiers for Target Domain with Limited or No Labels [Web] [Slides]
  120. Classifying Treatment Responders Under Causal Effect Monotonicity [Web] [Slides]
  121. LegoNet: Efficient Convolutional Neural Networks with Lego Filters [Web] [Slides]
  122. Trajectory-Based Off-Policy Deep Reinforcement Learning [Web] [Slides]
  123. Variational Russian Roulette for Deep Bayesian Nonparametrics [Web] [Slides]
  124. Approximating Orthogonal Matrices with Effective Givens Factorization [Web] [Slides]
  125. Lossless or Quantized Boosting with Integer Arithmetic [Web] [Slides]
  126. Simple Stochastic Gradient Methods for Non-Smooth Non-Convex Regularized Optimization [Web] [Slides]
  127. Provable Guarantees for Gradient-Based Meta-Learning [Web] [Slides]
  128. Population Based Augmentation: Efficient Learning of Augmentation Policy Schedules [Web] [Slides]
  129. Learning Models from Data with Measurement Error: Tackling Underreporting [Web] [Slides]
  130. Sorting Out Lipschitz Function Approximation [Web] [Slides]
  131. A Deep Reinforcement Learning Perspective on Internet Congestion Control [Web] [Slides]
  132. Incorporating Grouping Information into Bayesian Decision Tree Ensembles [Web] [Slides]
  133. New results on information theoretic clustering [Web] [Slides]
  134. Orthogonal Random Forest for Causal Inference [Web] [Slides]
  135. Surrogate Losses for Online Learning of Stepsizes in Stochastic Non-Convex Optimization [Web] [Slides]
  136. Towards Understanding Knowledge Distillation [Web] [Slides]
  137. Anomaly Detection With Multiple-Hypotheses Predictions [Web] [Slides]
  138. Adjustment Criteria for Generalizing Experimental Findings [Web] [Slides]
  139. Graph Element Networks: adaptive, structured computation and memory [Web] [Slides]
  140. Model-Based Active Exploration [Web]
  141. Variational Implicit Processes [Web]
  142. Improved Parallel Algorithms for Density-Based Network Clustering [Web] [Slides]
  143. MONK -- Outlier-Robust Mean Embedding Estimation by Median-of-Means [Web]
  144. Efficient Dictionary Learning with Gradient Descent [Web]
  145. Transferable Adversarial Training: A General Approach to Adapting Deep Classifiers [Web] [Slides]
  146. Kernel Mean Matching for Content Addressability of GANs [Web]
  147. Conditional Independence in Testing Bayesian Networks [Web] [Slides]
  148. Training CNNs with Selective Allocation of Channels [Web] [Slides]
  149. Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations [Web] [Slides]
  150. Discovering Latent Covariance Structures for Multiple Time Series [Web] [Slides]
  151. Submodular Observation Selection and Information Gathering for Quadratic Models [Web] [Slides]
  152. The advantages of multiple classes for reducing overfitting from test set reuse [Web] [Slides]
  153. Plug-and-Play Methods Provably Converge with Properly Trained Denoisers [Web] [Slides]
  154. Transferability vs. Discriminability: Batch Spectral Penalization for Adversarial Domain Adaptation [Web] [Slides]
  155. Neural Inverse Knitting: From Images to Manufacturing Instructions [Web] [Slides]
  156. Sensitivity Analysis of Linear Structural Causal Models [Web] [Slides]
  157. Equivariant Transformer Networks [Web] [Slides]
  158. Distributional Multivariate Policy Evaluation and Exploration with the Bellman GAN [Web] [Slides][Slides]
  159. Scalable Training of Inference Networks for Gaussian-Process Models [Web] [Slides]
  160. Submodular Cost Submodular Cover with an Approximate Oracle [Web] [Slides]
  161. On the statistical rate of nonlinear recovery in generative models with heavy-tailed data [Web] [Slides]
  162. Riemannian adaptive stochastic gradient algorithms on matrix manifolds [Web] [Slides]
  163. Learning-to-Learn Stochastic Gradient Descent with Biased Regularization [Web] [Slides]
  164. Making Convolutional Networks Shift-Invariant Again [Web] [Slides]
  165. More Efficient Off-Policy Evaluation through Regularized Targeted Learning [Web] [Slides]
  166. Overcoming Multi-model Forgetting [Web] [Slides]
  167. A Baseline for Any Order Gradient Estimation in Stochastic Computation Graphs [Web] [Slides]
  168. Bayesian Optimization Meets Bayesian Optimal Stopping [Web] [Slides]
  169. Submodular Streaming in All Its Glory: Tight Approximation, Minimum Memory and Low Adaptive Complexity [Web] [Slides]
  170. Phase transition in PCA with missing data: Reduced signal-to-noise ratio, not sample size! [Web] [Slides]
  171. Stochastic Optimization for DC Functions and Non-smooth Non-convex Regularizers with Non-asymptotic Convergence [Web] [Slides]
  172. BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning [Web] [Slides]
  173. Generative Modeling of Infinite Occluded Objects for Compositional Scene Representation [Web] [Slides]
  174. Inferring Heterogeneous Causal Effects in Presence of Spatial Confounding [Web] [Slides]
  175. Bayesian Nonparametric Federated Learning of Neural Networks [Web] [Slides]
  176. Remember and Forget for Experience Replay [Web] [Slides]
  177. Learning interpretable continuous-time models of latent stochastic dynamical systems [Web] [Slides]
  178. Hiring Under Uncertainty [Web] [Slides]
  179. On Medians of (Randomized) Pairwise Means [Web] [Slides]
  180. Alternating Minimizations Converge to Second-Order Optimal Solutions [Web] [Slides]
  181. Towards Accurate Model Selection in Deep Unsupervised Domain Adaptation [Web] [Slides][Slides]
  182. IMEXnet - A Forward Stable Deep Neural Network [Web] [Slides]
  183. Adversarially Learned Representations for Information Obfuscation and Inference [Web] [Slides]
  184. How does Disagreement Help Generalization against Label Corruption? [Web] [Slides]
  185. Tensor Variable Elimination for Plated Factor Graphs [Web] [Slides]
  186. A Tree-Based Method for Fast Repeated Sampling of Determinantal Point Processes [Web]
  187. Position-aware Graph Neural Networks [Web]
  188. Accelerated Linear Convergence of Stochastic Momentum Methods in Wasserstein Distances [Web]
  189. Provably Efficient Imitation Learning from Observation Alone [Web]
  190. Active Embedding Search via Noisy Paired Comparisons [Web] [Slides]
  191. Do ImageNet Classifiers Generalize to ImageNet? [Web]
  192. Adaptive Neural Trees [Web] [Slides]
  193. EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis [Web] [Slides]
  194. Predicate Exchange: Inference with Declarative Knowledge [Web] [Slides]
  195. Nonlinear Stein Variational Gradient Descent for Learning Diversified Mixture Models [Web] [Slides]
  196. Detecting Overlapping and Correlated Communities without Pure Nodes: Identifiability and Algorithm [Web] [Slides]
  197. SGD without Replacement: Sharper Rates for General Smooth Convex Functions [Web] [Slides]
  198. Dead-ends and Secure Exploration in Reinforcement Learning [Web] [Slides]
  199. Fast Direct Search in an Optimally Compressed Continuous Target Space for Efficient Multi-Label Active Learning [Web] [Slides]
  200. Exploring the Landscape of Spatial Robustness [Web] [Slides]
  201. Connectivity-Optimized Representation Learning via Persistent Homology [Web] [Slides]
  202. Addressing the Loss-Metric Mismatch with Adaptive Loss Alignment [Web] [Slides]
  203. Discriminative Regularization for Latent Variable Models with Applications to Electrocardiography [Web] [Slides]
  204. Understanding and Accelerating Particle-Based Variational Inference [Web] [Slides]
  205. Learning Generative Models across Incomparable Spaces [Web] [Slides]
  206. On the Complexity of Approximating Wasserstein Barycenters [Web] [Slides]
  207. Statistics and Samples in Distributional Reinforcement Learning [Web] [Slides]
  208. Myopic Posterior Sampling for Adaptive Goal Oriented Design of Experiments [Web] [Slides]
  209. Sever: A Robust Meta-Algorithm for Stochastic Optimization [Web] [Slides]
  210. Minimal Achievable Sufficient Statistic Learning [Web] [Slides]
  211. Deep Compressed Sensing [Web] [Slides]
  212. Hierarchical Decompositional Mixtures of Variational Autoencoders [Web] [Slides]
  213. Efficient learning of smooth probability functions from Bernoulli tests with guarantees [Web] [Slides]
  214. Relational Pooling for Graph Representations [Web] [Slides]
  215. Estimate Sequences for Variance-Reduced Stochastic Composite Optimization [Web] [Slides]
  216. Hessian Aided Policy Gradient [Web] [Slides]
  217. Bayesian Generative Active Deep Learning [Web] [Slides]
  218. Analyzing Federated Learning through an Adversarial Lens [Web] [Slides]
  219. Learning to Route in Similarity Graphs [Web] [Slides]
  220. Differentiable Dynamic Normalization for Learning Deep Representation [Web] [Slides]
  221. Finding Mixed Nash Equilibria of Generative Adversarial Networks [Web] [Slides]
  222. The Variational Predictive Natural Gradient [Web] [Slides]
  223. Disentangled Graph Convolutional Networks [Web] [Slides]
  224. A Dynamical Systems Perspective on Nesterov Acceleration [Web] [Slides]
  225. Provably Efficient Maximum Entropy Exploration [Web] [Slides]
  226. Active Learning for Probabilistic Structured Prediction of Cuts and Matchings [Web] [Slides]
  227. Fairwashing: the risk of rationalization [Web] [Slides]
  228. Invariant-Equivariant Representation Learning for Multi-Class Data [Web] [Slides]
  229. Toward Understanding the Importance of Noise in Training Neural Networks [Web] [Slides]
  230. CompILE: Compositional Imitation Learning and Execution [Web]
  231. Scalable Nonparametric Sampling from Multimodal Posteriors with the Posterior Bootstrap [Web] [Slides]
  232. Open Vocabulary Learning on Source Code with a Graph-Structured Cache [Web] [Slides]
  233. Random Shuffling Beats SGD after Finite Epochs [Web] [Slides]
  234. Combining parametric and nonparametric models for off-policy evaluation [Web] [Slides]
  235. Active Learning with Disagreement Graphs [Web] [Slides]
  236. Understanding the Origins of Bias in Word Embeddings [Web]
  237. Infinite Mixture Prototypes for Few-shot Learning [Web] [Slides]
  238. Cheap Orthogonal Constraints in Neural Networks: A Simple Parametrization of the Orthogonal and Unitary Group [Web] [Slides]
  239. Sparse Multi-Channel Variational Autoencoder for the Joint Analysis of Heterogeneous Data [Web] [Slides]
  240. An Instability in Variational Inference for Topic Models [Web] [Slides]
  241. Learning Discrete Structures for Graph Neural Networks [Web] [Slides]
  242. First-Order Algorithms Converge Faster than $O(1/k)$ on Convex Problems [Web] [Slides]
  243. Sample-Optimal Parametric Q-Learning Using Linearly Additive Features [Web] [Slides]
  244. Multi-Frequency Vector Diffusion Maps [Web] [Slides]
  245. Bias Also Matters: Bias Attribution for Deep Neural Network Explanation [Web] [Slides]
  246. MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing [Web] [Slides]
  247. Breaking Inter-Layer Co-Adaptation by Classifier Anonymization [Web] [Slides][Slides]
  248. Deep Generative Learning via Variational Gradient Flow [Web] [Slides]
  249. Bayesian Optimization of Composite Functions [Web] [Slides]
  250. Compositional Fairness Constraints for Graph Embeddings [Web] [Slides]
  251. Improved Convergence for $\ell_1$ and $\ell_\infty$ Regression via Iteratively Reweighted Least Squares [Web] [Slides]
  252. Transfer of Samples in Policy Search via Multiple Importance Sampling [Web] [Slides]
  253. Co-manifold learning with missing data [Web] [Slides]
  254. Interpreting Adversarially Trained Convolutional Neural Networks [Web] [Slides]
  255. Learn to Grow: A Continual Structure Learning Framework for Overcoming Catastrophic Forgetting [Web] [Slides]
  256. Understanding the Impact of Entropy on Policy Optimization [Web] [Slides]
  257. Flow++: Improving Flow-Based Generative Models with Variational Dequantization and Architecture Design [Web] [Slides]
  258. The Kernel Interaction Trick: Fast Bayesian Discovery of Pairwise Interactions in High Dimensions [Web] [Slides]
  259. A Recurrent Neural Cascade-based Model for Continuous-Time Diffusion [Web] [Slides]
  260. Optimal Mini-Batch and Step Sizes for SAGA [Web] [Slides]
  261. Exploration Conscious Reinforcement Learning Revisited [Web] [Slides]
  262. [Web]
  263. Counterfactual Visual Explanations [Web] [Slides]
  264. [Web]
  265. Probability Functional Descent: A Unifying Perspective on GANs, Variational Inference, and Reinforcement Learning [Web] [Slides]
  266. Learning Neurosymbolic Generative Models via Program Synthesis [Web] [Slides]
  267. Quantile Stein Variational Gradient Descent for Batch Bayesian Optimization [Web] [Slides]
  268. Stochastic Blockmodels meet Graph Neural Networks [Web] [Slides]
  269. Differential Inclusions for Modeling Nonsmooth ADMM Variants: A Continuous Limit Theory [Web] [Slides]
  270. Kernel-Based Reinforcement Learning in Robust Markov Decision Processes [Web] [Slides]
  271. [Web]
  272. Data Poisoning Attacks on Stochastic Bandits [Web] [Slides]
  273. Posters Wed [Web]

Day 3

  1. Neural Network Attributions: A Causal Perspective [Web]
  2. State-Reification Networks: Improving Generalization by Modeling the Distribution of Hidden Representations [Web] [Slides]
  3. Batch Policy Learning under Constraints [Web] [Slides]
  4. Sliced-Wasserstein Flows: Nonparametric Generative Modeling via Optimal Transport and Diffusions [Web]
  5. Matrix-Free Preconditioning in Online Learning [Web] [Slides]
  6. Geometric Losses for Distributional Learning [Web] [Slides]
  7. Defending Against Saddle Point Attack in Byzantine-Robust Distributed Learning [Web]
  8. Doubly Robust Joint Learning for Recommendation on Data Missing Not at Random [Web] [Slides]
  9. On Sparse Linear Regression in the Local Differential Privacy Model [Web]
  10. Towards a Deep and Unified Understanding of Deep Neural Models in NLP [Web] [Slides]
  11. Variational Laplace Autoencoders [Web] [Slides]
  12. Quantifying Generalization in Reinforcement Learning [Web] [Slides]
  13. Non-Asymptotic Analysis of Fractional Langevin Monte Carlo for Non-Convex Optimization [Web] [Slides]
  14. Online Convex Optimization in Adversarial Markov Decision Processes [Web] [Slides][Slides]
  15. Classification from Positive, Unlabeled and Biased Negative Data [Web] [Slides]
  16. Stochastic Iterative Hard Thresholding for Graph-structured Sparsity Optimization [Web] [Slides]
  17. Linear-Complexity Data-Parallel Earth Mover's Distance Approximations [Web] [Slides]
  18. Differentially Private Empirical Risk Minimization with Non-convex Loss Functions [Web] [Slides]
  19. Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Value Approximation [Web] [Slides]
  20. Latent Normalizing Flows for Discrete Sequences [Web] [Slides]
  21. Learning Latent Dynamics for Planning from Pixels [Web] [Slides]
  22. Unifying Orthogonal Monte Carlo Methods [Web] [Slides]
  23. Competing Against Nash Equilibria in Adversarially Changing Zero-Sum Games [Web] [Slides]
  24. Complementary-Label Learning for Arbitrary Losses and Models [Web] [Slides]
  25. Neuron birth-death dynamics accelerates gradient descent and converges asymptotically [Web] [Slides]
  26. Model Comparison for Semantic Grouping [Web] [Slides]
  27. Bounding User Contributions: A Bias-Variance Trade-off in Differential Privacy [Web] [Slides]
  28. Functional Transparency for Structured Data: a Game-Theoretic Approach [Web] [Slides]
  29. Multi-objective training of Generative Adversarial Networks with multiple discriminators [Web] [Slides]
  30. Projections for Approximate Policy Iteration Algorithms [Web] [Slides]
  31. Adaptive Monte Carlo Multiple Testing via Multi-Armed Bandits [Web] [Slides]
  32. Online Learning with Sleeping Experts and Feedback Graphs [Web] [Slides]
  33. Learning to Infer Program Sketches [Web] [Slides]
  34. Width Provably Matters in Optimization for Deep Linear Neural Networks [Web] [Slides]
  35. RaFM: Rank-Aware Factorization Machines [Web] [Slides]
  36. Differentially Private Learning of Geometric Concepts [Web] [Slides]
  37. Exploring interpretable LSTM neural networks over multi-variable data [Web] [Slides]
  38. Learning Discrete and Continuous Factors of Data via Alternating Disentanglement [Web] [Slides]
  39. Learning Structured Decision Problems with Unawareness [Web] [Slides]
  40. Metropolis-Hastings Generative Adversarial Networks [Web] [Slides]
  41. Incremental Randomized Sketching for Online Kernel Learning [Web] [Slides]
  42. Hierarchically Structured Meta-learning [Web] [Slides]
  43. Overparameterized Nonlinear Learning: Gradient Descent Takes the Shortest Path? [Web] [Slides]
  44. CAB: Continuous Adaptive Blending for Policy Evaluation and Learning [Web] [Slides]
  45. Toward Controlling Discrimination in Online Ad Auctions [Web] [Slides]
  46. TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing [Web]
  47. Bit-Swap: Recursive Bits-Back Coding for Lossless Compression with Hierarchical Latent Variables [Web]
  48. Calibrated Model-Based Deep Reinforcement Learning [Web] [Slides]
  49. Scalable Metropolis-Hastings for Exact Bayesian Inference with Large Datasets [Web] [Slides]
  50. Adaptive Scale-Invariant Online Algorithms for Learning Linear Models [Web]
  51. Bridging Theory and Algorithm for Domain Adaptation [Web] [Slides]
  52. Power k-Means Clustering [Web] [Slides]
  53. MetricGAN: Generative Adversarial Networks based Black-box Metric Scores Optimization for Speech Enhancement [Web]
  54. Learning Optimal Fair Policies [Web]
  55. Gaining Free or Low-Cost Interpretability with Interpretable Partial Substitute [Web] [Slides]
  56. Graphite: Iterative Generative Modeling of Graphs [Web] [Slides]
  57. Reinforcement Learning in Configurable Continuous Environments [Web] [Slides]
  58. Replica Conditional Sequential Monte Carlo [Web] [Slides]
  59. Online Control with Adversarial Disturbances [Web] [Slides]
  60. Transfer Learning for Related Reinforcement Learning Tasks via Image-to-Image Translation [Web] [Slides]
  61. Distributed Learning over Unreliable Networks [Web] [Slides]
  62. Neural Separation of Observed and Unobserved Distributions [Web] [Slides]
  63. Fairness-Aware Learning for Continuous Attributes and Treatments [Web] [Slides]
  64. State-Regularized Recurrent Neural Networks [Web] [Slides]
  65. Hybrid Models with Deep and Invertible Features [Web] [Slides]
  66. Target-Based Temporal-Difference Learning [Web] [Slides]
  67. A Polynomial Time MCMC Method for Sampling from Continuous Determinantal Point Processes [Web] [Slides]
  68. Adversarial Online Learning with noise [Web] [Slides]
  69. Learning What and Where to Transfer [Web] [Slides]
  70. Escaping Saddle Points with Adaptive Gradient Methods [Web] [Slides]
  71. Almost Unsupervised Text to Speech and Automatic Speech Recognition [Web] [Slides]
  72. Fairness risk measures [Web] [Slides]
  73. Understanding Impacts of High-Order Loss Approximations and Features in Deep Learning Interpretation [Web] [Slides]
  74. MIWAE: Deep Generative Modelling and Imputation of Incomplete Data Sets [Web] [Slides]
  75. Iterative Linearized Control: Stable Algorithms and Complexity Guarantees [Web] [Slides]
  76. Adaptive Antithetic Sampling for Variance Reduction [Web] [Slides]
  77. Online Variance Reduction with Mixtures [Web] [Slides]
  78. [Web]
  79. $\texttt{DoubleSqueeze}$: Parallel Stochastic Gradient Descent with Double-pass Error-Compensated Compression [Web] [Slides]
  80. AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss [Web] [Slides]
  81. [Web]
  82. On the Connection Between Adversarial Robustness and Saliency Map Interpretability [Web] [Slides]
  83. On Scalable and Efficient Computation of Large Scale Optimal Transport [Web] [Slides]
  84. Finding Options that Minimize Planning Time [Web] [Slides]
  85. Accelerated Flow for Probability Distributions [Web] [Slides]
  86. Bandit Multiclass Linear Classification: Efficient Algorithms for the Separable Case [Web] [Slides]
  87. [Web]
  88. Model Function Based Conditional Gradient Method with Armijo-like Line Search [Web] [Slides]
  89. A fully differentiable beam search decoder [Web] [Slides]
  90. [Web]
  91. Why do Larger Models Generalize Better? A Theoretical Perspective via the XOR Problem [Web]
  92. Understanding and correcting pathologies in the training of learned optimizers [Web]
  93. Stochastic Beams and Where To Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement [Web] [Slides]
  94. Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel $k$-means Clustering [Web] [Slides]
  95. Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret [Web] [Slides]
  96. DBSCAN++: Towards fast and scalable density clustering [Web]
  97. Analogies Explained: Towards Understanding Word Embeddings [Web] [Slides]
  98. Scaling Up Ordinal Embedding: A Landmark Approach [Web] [Slides]
  99. Proportionally Fair Clustering [Web] [Slides]
  100. On the Spectral Bias of Neural Networks [Web] [Slides]
  101. Demystifying Dropout [Web] [Slides]
  102. Learning to Exploit Long-term Relational Dependencies in Knowledge Graphs [Web] [Slides]
  103. Dimensionality Reduction for Tukey Regression [Web] [Slides]
  104. Learning from Delayed Outcomes via Proxies with Applications to Recommender Systems [Web] [Slides]
  105. Concrete Autoencoders: Differentiable Feature Selection and Reconstruction [Web] [Slides]
  106. Parameter-Efficient Transfer Learning for NLP [Web] [Slides]
  107. Learning to select for a predefined ranking [Web] [Slides]
  108. Stable and Fair Classification [Web] [Slides]
  109. Recursive Sketches for Modular Deep Learning [Web] [Slides]
  110. Ladder Capsule Network [Web] [Slides]
  111. Meta-Learning Neural Bloom Filters [Web] [Slides]
  112. Efficient Full-Matrix Adaptive Regularization [Web] [Slides]
  113. Adaptive Regret of Convex and Smooth Functions [Web] [Slides]
  114. Gromov-Wasserstein Learning for Graph Matching and Node Embedding [Web] [Slides]
  115. Efficient On-Device Models using Neural Projections [Web] [Slides]
  116. Mallows ranking models: maximum likelihood estimate and regeneration [Web] [Slides]
  117. Flexibly Fair Representation Learning by Disentanglement [Web] [Slides]
  118. Zero-Shot Knowledge Distillation in Deep Networks [Web] [Slides]
  119. Unreproducible Research is Reproducible [Web] [Slides]
  120. CoT: Cooperative Training for Generative Modeling of Discrete Data [Web] [Slides]
  121. Breaking the gridlock in Mixture-of-Experts: Consistent and Efficient Algorithms [Web] [Slides]
  122. Online Adaptive Principal Component Analysis and Its extensions [Web] [Slides]
  123. Spectral Clustering of Signed Graphs via Matrix Power Means [Web] [Slides]
  124. Deep Residual Output Layers for Neural Language Generation [Web] [Slides]
  125. Fast and Stable Maximum Likelihood Estimation for Incomplete Multinomial Models [Web] [Slides]
  126. Fair Regression: Quantitative Definitions and Reduction-Based Algorithms [Web] [Slides]
  127. A Convergence Theory for Deep Learning via Over-Parameterization [Web] [Slides]
  128. Geometric Scattering for Graph Data Analysis [Web] [Slides]
  129. Non-Monotonic Sequential Text Generation [Web] [Slides]
  130. Efficient Nonconvex Regularized Tensor Completion with Structure-aware Proximal Iterations [Web] [Slides]
  131. POLITEX: Regret Bounds for Policy Iteration using Expert Prediction [Web] [Slides]
  132. Coresets for Ordered Weighted Clustering [Web] [Slides]
  133. Improving Neural Language Modeling via Adversarial Training [Web] [Slides]
  134. Fast Algorithm for Generalized Multinomial Models with Ranking Data [Web] [Slides]
  135. Fairness without Harm: Decoupled Classifiers with Preference Guarantees [Web] [Slides]
  136. A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks [Web]
  137. Robust Inference via Generative Classifiers for Handling Noisy Labels [Web] [Slides]
  138. Insertion Transformer: Flexible Sequence Generation via Insertion Operations [Web]
  139. Robust Estimation of Tree Structured Gaussian Graphical Models [Web] [Slides]
  140. Anytime Online-to-Batch, Optimism and Acceleration [Web] [Slides]
  141. Fair k-Center Clustering for Data Summarization [Web]
  142. Mixture Models for Diverse Machine Translation: Tricks of the Trade [Web]
  143. Graph Resistance and Learning from Pairwise Comparisons [Web] [Slides]
  144. Differentially Private Fair Learning [Web]
  145. Approximation and non-parametric estimation of ResNet-type convolutional neural networks [Web] [Slides][Slides]
  146. LIT: Learned Intermediate Representation Training for Model Compression [Web] [Slides]
  147. Empirical Analysis of Beam Search Performance Degradation in Neural Sequence Models [Web] [Slides]
  148. Spectral Approximate Inference [Web] [Slides]
  149. Cautious Regret Minimization: Online Optimization with Long-Term Budget Constraints [Web] [Slides]
  150. A Better k-means++ Algorithm via Local Search [Web] [Slides]
  151. MASS: Masked Sequence to Sequence Pre-training for Language Generation [Web] [Slides]
  152. Learning Context-dependent Label Permutations for Multi-label Classification [Web] [Slides]
  153. Obtaining Fairness using Optimal Transport Theory [Web] [Slides]
  154. Global Convergence of Block Coordinate Descent in Deep Learning [Web] [Slides]
  155. Analyzing and Improving Representations with the Soft Nearest Neighbor Loss [Web] [Slides]
  156. Trainable Decoding of Sets of Sequences for Neural Sequence Models [Web] [Slides]
  157. Partially Linear Additive Gaussian Graphical Models [Web] [Slides]
  158. Optimal Kronecker-Sum Approximation of Real Time Recurrent Learning [Web] [Slides]
  159. Kernel Normalized Cut: a Theoretical Revisit [Web] [Slides]
  160. Humor in Word Embeddings: Cockamamie Gobbledegook for Nincompoops [Web] [Slides]
  161. Discovering Context Effects from Raw Choice Data [Web] [Slides]
  162. Repairing without Retraining: Avoiding Disparate Impact with Counterfactual Distributions [Web] [Slides]
  163. Measurements of Three-Level Hierarchical Structure in the Outliers in the Spectrum of Deepnet Hessians [Web] [Slides]
  164. What is the Effect of Importance Weighting in Deep Learning? [Web] [Slides]
  165. Learning to Generalize from Sparse and Underspecified Rewards [Web] [Slides]
  166. DAG-GNN: DAG Structure Learning with Graph Neural Networks [Web] [Slides]
  167. Adaptive Sensor Placement for Continuous Spaces [Web] [Slides]
  168. Guarantees for Spectral Clustering with Fairness Constraints [Web] [Slides]
  169. MeanSum: A Neural Model for Unsupervised Multi-Document Abstractive Summarization [Web] [Slides]
  170. On the Feasibility of Learning, Rather than Assuming, Human Biases for Reward Inference [Web] [Slides]
  171. On the Long-term Impact of Algorithmic Decision Policies: Effort Unfairness and Feature Segregation through Social Learning [Web] [Slides]
  172. On the Limitations of Representing Functions on Sets [Web] [Slides]
  173. Similarity of Neural Network Representations Revisited [Web] [Slides]
  174. Efficient Training of BERT by Progressively Stacking [Web] [Slides]
  175. Random Walks on Hypergraphs with Edge-Dependent Vertex Weights [Web] [Slides]
  176. Scale-free adaptive planning for deterministic dynamics & discounted rewards [Web] [Slides]
  177. Supervised Hierarchical Clustering with Exponential Linkage [Web] [Slides]
  178. CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network [Web] [Slides]
  179. Learning Distance for Sequences by Learning a Ground Metric [Web] [Slides]
  180. Making Decisions that Reduce Discriminatory Impacts [Web] [Slides]
  181. What 4 year olds can do and AI can’t (yet) [Web]
  182. Best Paper [Web]
  183. Probabilistic Neural Symbolic Models for Interpretable Visual Question Answering [Web]
  184. Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations [Web]
  185. Decentralized Exploration in Multi-Armed Bandits [Web] [Slides]
  186. Communication-Constrained Inference and the Role of Shared Randomness [Web] [Slides]
  187. COMIC: Multi-view Clustering Without Parameter Selection [Web]
  188. Submodular Maximization beyond Non-negativity: Guarantees, Fast Algorithms, and Applications [Web]
  189. Nonparametric Bayesian Deep Networks with Local Competition [Web] [Slides][Slides]
  190. Distributed, Egocentric Representations of Graphs for Detecting Critical Structures [Web] [Slides]
  191. Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback [Web] [Slides]
  192. Learning and Data Selection in Big Datasets [Web] [Slides]
  193. The Wasserstein Transform [Web] [Slides]
  194. Online Algorithms for Rent-Or-Buy with Expert Advice [Web] [Slides]
  195. Good Initializations of Variational Bayes for Deep Models [Web] [Slides]
  196. Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-linearities [Web] [Slides]
  197. Exploiting structure of uncertainty for efficient matroid semi-bandits [Web] [Slides]
  198. Sublinear quantum algorithms for training linear and kernel-based classifiers [Web] [Slides]
  199. Sequential Facility Location: Approximate Submodularity and Greedy Algorithm [Web] [Slides]
  200. Non-monotone Submodular Maximization with Nearly Optimal Adaptivity and Query Complexity [Web] [Slides]
  201. Dropout as a Structured Shrinkage Prior [Web] [Slides]
  202. Multi-Object Representation Learning with Iterative Variational Inference [Web] [Slides]
  203. PAC Identification of Many Good Arms in Stochastic Multi-Armed Bandits [Web] [Slides]
  204. Agnostic Federated Learning [Web] [Slides]
  205. Neural Collaborative Subspace Clustering [Web] [Slides]
  206. Categorical Feature Compression via Submodular Optimization [Web] [Slides]
  207. ARSM: Augment-REINFORCE-Swap-Merge Estimator for Gradient Backpropagation Through Categorical Variables [Web] [Slides]
  208. Cross-Domain 3D Equivariant Image Embeddings [Web] [Slides]
  209. Contextual Multi-armed Bandit Algorithm for Semiparametric Reward Model [Web] [Slides]
  210. Discovering Conditionally Salient Features with Statistical Guarantees [Web] [Slides]
  211. Unsupervised Deep Learning by Neighbourhood Discovery [Web] [Slides]
  212. Multi-Frequency Phase Synchronization [Web] [Slides]
  213. On Variational Bounds of Mutual Information [Web]
  214. Loss Landscapes of Regularized Linear Autoencoders [Web] [Slides]
  215. Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning [Web]
  216. A Theoretical Analysis of Contrastive Unsupervised Representation Learning [Web]
  217. Autoregressive Energy Machines [Web]
  218. Faster Algorithms for Binary Matrix Factorization [Web]
  219. Partially Exchangeable Networks and Architectures for Learning Summary Statistics in Approximate Bayesian Computation [Web] [Slides]
  220. Hyperbolic Disk Embeddings for Directed Acyclic Graphs [Web] [Slides]
  221. TarMAC: Targeted Multi-Agent Communication [Web] [Slides]
  222. The information-theoretic value of unlabeled data in semi-supervised learning [Web] [Slides]
  223. Greedy Orthogonal Pivoting Algorithm for Non-Negative Matrix Factorization [Web] [Slides]
  224. Tractable n-Metrics for Multiple Graphs [Web] [Slides]
  225. Hierarchical Importance Weighted Autoencoders [Web] [Slides]
  226. LatentGNN: Learning Efficient Non-local Relations for Visual Recognition [Web] [Slides]
  227. QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning [Web] [Slides]
  228. Unsupervised Label Noise Modeling and Loss Correction [Web] [Slides]
  229. Noise2Self: Blind Denoising by Self-Supervision [Web] [Slides]
  230. Guided evolutionary strategies: augmenting random search with surrogate gradients [Web] [Slides]
  231. Faster Attend-Infer-Repeat with Tractable Probabilistic Models [Web] [Slides]
  232. Robustly Disentangled Causal Mechanisms: Validating Deep Representations for Interventional Robustness [Web] [Slides]
  233. Actor-Attention-Critic for Multi-Agent Reinforcement Learning [Web] [Slides]
  234. Domain Adaptation with Asymmetrically-Relaxed Distribution Alignment [Web] [Slides]
  235. Learning Dependency Structures for Weak Supervision Models [Web] [Slides]
  236. Adaptive and Safe Bayesian Optimization in High Dimensions via One-Dimensional Subspaces [Web] [Slides]
  237. Understanding Priors in Bayesian Neural Networks at the Unit Level [Web] [Slides]
  238. Lorentzian Distance Learning for Hyperbolic Representations [Web] [Slides]
  239. Finite-Time Analysis of Distributed TD(0) with Linear Function Approximation on Multi-Agent Reinforcement Learning [Web] [Slides]
  240. Pareto Optimal Streaming Unsupervised Classification [Web] [Slides]
  241. Geometry and Symmetry in Short-and-Sparse Deconvolution [Web] [Slides]
  242. Semi-Cyclic Stochastic Gradient Descent [Web] [Slides]
  243. Posters Thu [Web]