ICML-2019
A summary of research work presented in the thirty-sixth International Conference on Machine Learning (ICML) @ Long beach - 2019
Tutorials
- Never Ending Learning [Web]
- Active learning from theory to practice [Web]
- Active Hypothesis Testing: An Information Theoretic (re)View [Web]
- Recent Advances in Population-Based Search for Deep Neural Networks: Quality Diversity, Indirect Encodings, and Open-Ended Algorithms [Web]
- Meta-Learning: from Few-Shot Learning to Rapid Reinforcement Learning [Web]
- A Tutorial on Attention in Deep Learning [Web]
- Causal Inference and Stable Learning [Web]
- A Primer on PAC-Bayesian Learning [Web]
- Neural Approaches to Conversational AI [Web]
Multi Track Research
Day 1
- Opening Remarks [Web]
- Machine learning for robots to think fast [Web]
- Best Paper [Web]
- Adversarial Attacks on Node Embeddings via Graph Poisoning [Web] [Slides]
- SelectiveNet: A Deep Neural Network with an Integrated Reject Option [Web]
- ELF OpenGo: an analysis and open reimplementation of AlphaZero [Web]
- A Contrastive Divergence for Combining Variational Inference and MCMC [Web] [Slides]
- Regret Circuits: Composability of Regret Minimizers [Web] [Slides]
- Refined Complexity of PCA with Outliers [Web]
- PA-GD: On the Convergence of Perturbed Alternating Gradient Descent to Second-Order Stationary Points for Structured Nonconvex Optimization [Web] [Slides]
- Validating Causal Inference Models via Influence Functions [Web]
- Data Shapley: Equitable Valuation of Data for Machine Learning [Web]
- First-Order Adversarial Vulnerability of Neural Networks and Input Dimension [Web] [Slides]
- Manifold Mixup: Better Representations by Interpolating Hidden States [Web] [Slides]
- Making Deep Q-learning methods robust to time discretization [Web] [Slides]
- Calibrated Approximate Bayesian Inference [Web] [Slides]
- Game Theoretic Optimization via Gradient-based Nikaido-Isoda Function [Web] [Slides]
- On Efficient Optimal Transport: An Analysis of Greedy and Accelerated Mirror Descent Algorithms [Web] [Slides]
- Improved Zeroth-Order Variance Reduced Algorithms and Analysis for Nonconvex Optimization [Web] [Slides]
- Ithemal: Accurate, Portable and Fast Basic Block Throughput Estimation using Deep Neural Networks [Web] [Slides]
- Feature Grouping as a Stochastic Regularizer for High-Dimensional Structured Data [Web] [Slides]
- On Certifying Non-Uniform Bounds against Adversarial Attacks [Web] [Slides]
- Processing Megapixel Images with Deep Attention-Sampling Models [Web] [Slides]
- Nonlinear Distributional Gradient Temporal-Difference Learning [Web] [Slides]
- Moment-Based Variational Inference for Markov Jump Processes [Web] [Slides]
- Stable-Predictive Optimistic Counterfactual Regret Minimization [Web] [Slides]
- Passed & Spurious: Descent Algorithms and Local Minima in Spiked Matrix-Tensor Models [Web] [Slides]
- Faster Stochastic Alternating Direction Method of Multipliers for Nonconvex Optimization [Web] [Slides]
- Learning to Groove with Inverse Sequence Transformations [Web] [Slides]
- Metric-Optimized Example Weights [Web] [Slides]
- Improving Adversarial Robustness via Promoting Ensemble Diversity [Web] [Slides]
- TapNet: Neural Network Augmented with Task-Adaptive Projection for Few-Shot Learning [Web] [Slides]
- Composing Entropic Policies using Divergence Correction [Web] [Slides]
- Understanding MCMC Dynamics as Flows on the Wasserstein Space [Web] [Slides]
- When Samples Are Strategically Selected [Web] [Slides]
- Teaching a black-box learner [Web] [Slides]
- Lower Bounds for Smooth Nonconvex Finite-Sum Optimization [Web] [Slides]
- Grid-Wise Control for Multi-Agent Reinforcement Learning in Video Game AI [Web] [Slides]
- Improving Model Selection by Employing the Test Data [Web] [Slides]
- Adversarial camera stickers: A physical camera-based attack on deep learning systems [Web] [Slides]
- Online Meta-Learning [Web] [Slides]
- TibGM: A Transferable and Information-Based Graphical Model Approach for Reinforcement Learning [Web] [Slides]
- LR-GLM: High-Dimensional Bayesian Inference Using Low-Rank Data Approximations [Web] [Slides]
- Statistical Foundations of Virtual Democracy [Web] [Slides]
- PAC Learnability of Node Functions in Networked Dynamical Systems [Web] [Slides]
- Nonconvex Variance Reduced Optimization with Arbitrary Sampling [Web] [Slides]
- HOList: An Environment for Machine Learning of Higher Order Logic Theorem Proving [Web] [Slides]
- Topological Data Analysis of Decision Boundaries with Application to Model Selection [Web] [Slides]
- Adversarial examples from computational constraints [Web]
- Training Neural Networks with Local Error Signals [Web] [Slides]
- Multi-Agent Adversarial Inverse Reinforcement Learning [Web] [Slides]
- Amortized Monte Carlo Integration [Web] [Slides]
- Optimal Auctions through Deep Learning [Web] [Slides]
- Online learning with kernel losses [Web] [Slides]
- Error Feedback Fixes SignSGD and other Gradient Compression Schemes [Web]
- Molecular Hypergraph Grammar with Its Application to Molecular Optimization [Web] [Slides]
- Contextual Memory Trees [Web]
- POPQORN: Quantifying Robustness of Recurrent Neural Networks [Web] [Slides]
- GMNN: Graph Markov Neural Networks [Web] [Slides]
- Policy Consolidation for Continual Reinforcement Learning [Web] [Slides]
- Stein Point Markov Chain Monte Carlo [Web] [Slides]
- Learning to Clear the Market [Web] [Slides]
- Nearest Neighbor and Kernel Survival Analysis: Nonasymptotic Error Bounds and Strong Consistency Rates [Web] [Slides]
- A Composite Randomized Incremental Gradient Method [Web] [Slides]
- Graph Neural Network for Music Score Data and Modeling Expressive Piano Performance [Web] [Slides]
- Sparse Extreme Multi-label Learning with Oracle Property [Web] [Slides]
- Using Pre-Training Can Improve Model Robustness and Uncertainty [Web] [Slides]
- Self-Attention Graph Pooling [Web] [Slides]
- Off-Policy Deep Reinforcement Learning without Exploration [Web] [Slides]
- Fast and Simple Natural-Gradient Variational Inference with Mixture of Exponential-family Approximations [Web] [Slides]
- Learning to bid in revenue-maximizing auctions [Web] [Slides]
- Fast Rates for a kNN Classifier Robust to Unknown Asymmetric Label Noise [Web] [Slides]
- Optimal Continuous DR-Submodular Maximization and Applications to Provable Mean Field Inference [Web] [Slides]
- Learning to Prove Theorems via Interacting with Proof Assistants [Web] [Slides]
- Shape Constraints for Set Functions [Web] [Slides]
- [Web]
- Combating Label Noise in Deep Learning using Abstention [Web] [Slides]
- Random Expert Distillation: Imitation Learning via Expert Policy Support Estimation [Web] [Slides]
- Particle Flow Bayes' Rule [Web] [Slides]
- Open-ended learning in symmetric zero-sum games [Web] [Slides]
- Uniform Convergence Rate of the Kernel Density Estimator Adaptive to Intrinsic Volume Dimension [Web] [Slides]
- Multiplicative Weights Updates as a distributed constrained optimization algorithm: Convergence to second-order stationary points almost always [Web] [Slides]
- Circuit-GNN: Graph Neural Networks for Distributed Circuit Design [Web] [Slides]
- On The Power of Curriculum Learning in Training Deep Networks [Web] [Slides]
- PROVEN: Verifying Robustness of Neural Networks with a Probabilistic Approach [Web] [Slides]
- LGM-Net: Learning to Generate Matching Networks for Few-Shot Learning [Web] [Slides]
- Revisiting the Softmax Bellman Operator: New Benefits and New Perspective [Web] [Slides]
- Correlated Variational Auto-Encoders [Web] [Slides]
- Deep Counterfactual Regret Minimization [Web] [Slides]
- Maximum Likelihood Estimation for Learning Populations of Parameters [Web] [Slides]
- Katalyst: Boosting Convex Katayusha for Non-Convex Problems with a Large Condition Number [Web] [Slides]
- Learning to Optimize Multigrid PDE Solvers [Web] [Slides]
- Voronoi Boundary Classification: A High-Dimensional Geometric Approach via Weighted Monte Carlo Integration [Web] [Slides]
- On Learning Invariant Representations for Domain Adaptation [Web]
- Self-Attention Generative Adversarial Networks [Web]
- An Investigation of Model-Free Planning [Web]
- Towards a Unified Analysis of Random Fourier Features [Web]
- Generalized Approximate Survey Propagation for High-Dimensional Estimation [Web] [Slides]
- Projection onto Minkowski Sums with Application to Constrained Learning [Web] [Slides]
- Safe Policy Improvement with Baseline Bootstrapping [Web] [Slides]
- A Block Coordinate Descent Proximal Method for Simultaneous Filtering and Parameter Estimation [Web]
- Robust Decision Trees Against Adversarial Examples [Web] [Slides]
- Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep Models [Web] [Slides]
- Multivariate-Information Adversarial Ensemble for Scalable Joint Distribution Matching [Web] [Slides]
- CURIOUS: Intrinsically Motivated Modular Multi-Goal Reinforcement Learning [Web] [Slides]
- Learning deep kernels for exponential family densities [Web] [Slides]
- Boosted Density Estimation Remastered [Web] [Slides]
- Blended Conditonal Gradients [Web] [Slides]
- Distributional Reinforcement Learning for Efficient Exploration [Web] [Slides]
- Learning Hawkes Processes Under Synchronization Noise [Web] [Slides]
- Automatic Classifiers as Scientific Instruments: One Step Further Away from Ground-Truth [Web] [Slides]
- Adversarial Generation of Time-Frequency Features with application in audio synthesis [Web] [Slides]
- High-Fidelity Image Generation With Fewer Labels [Web] [Slides]
- Task-Agnostic Dynamics Priors for Deep Reinforcement Learning [Web] [Slides]
- Bayesian Deconditional Kernel Mean Embeddings [Web] [Slides]
- Inference and Sampling of
$K_{33}$ -free Ising Models [Web] [Slides] - Acceleration of SVRG and Katyusha X by Inexact Preconditioning [Web] [Slides]
- Optimistic Policy Optimization via Multiple Importance Sampling [Web] [Slides]
- Generative Adversarial User Model for Reinforcement Learning Based Recommendation System [Web] [Slides]
- Look Ma, No Latent Variables: Accurate Cutset Networks via Compilation [Web] [Slides]
- On the Universality of Invariant Networks [Web] [Slides]
- Revisiting precision recall definition for generative modeling [Web] [Slides]
- Diagnosing Bottlenecks in Deep Q-learning Algorithms [Web] [Slides]
- A Kernel Perspective for Regularizing Deep Neural Networks [Web] [Slides]
- Random Matrix Improved Covariance Estimation for a Large Class of Metrics [Web] [Slides]
- Characterization of Convex Objective Functions and Optimal Expected Convergence Rates for SGD [Web] [Slides]
- Neural Logic Reinforcement Learning [Web] [Slides]
- A Statistical Investigation of Long Memory in Language and Music [Web] [Slides]
- Optimal Transport for structured data with application on graphs [Web] [Slides]
- Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks [Web] [Slides]
- Wasserstein of Wasserstein Loss for Learning Generative Models [Web] [Slides]
- Collaborative Evolutionary Reinforcement Learning [Web] [Slides]
- A Persistent Weisfeiler--Lehman Procedure for Graph Classification [Web] [Slides]
- Dual Entangled Polynomial Code: Three-Dimensional Coding for Distributed Matrix Multiplication [Web] [Slides]
- A Conditional-Gradient-Based Augmented Lagrangian Framework [Web] [Slides]
- Learning to Collaborate in Markov Decision Processes [Web] [Slides]
- Deep Factors for Forecasting [Web] [Slides]
- Learning Optimal Linear Regularizers [Web] [Slides]
- Gauge Equivariant Convolutional Networks and the Icosahedral CNN [Web]
- Flat Metric Minimization with Applications in Generative Modeling [Web] [Slides]
- EMI: Exploration with Mutual Information [Web]
- Rehashing Kernel Evaluation in High Dimensions [Web] [Slides]
- Neural Joint Source-Channel Coding [Web] [Slides]
- SGD: General Analysis and Improved Rates [Web]
- Predictor-Corrector Policy Optimization [Web] [Slides]
- Weakly-Supervised Temporal Localization via Occurrence Count Learning [Web] [Slides]
- On Symmetric Losses for Learning from Corrupted Labels [Web] [Slides]
- Feature-Critic Networks for Heterogeneous Domain Generalization [Web] [Slides]
- Entropic GANs meet VAEs: A Statistical Approach to Compute Sample Likelihoods in GANs [Web] [Slides]
- Imitation Learning from Imperfect Demonstration [Web] [Slides]
- Large-Scale Sparse Kernel Canonical Correlation Analysis [Web] [Slides][Slides]
- Doubly-Competitive Distribution Estimation [Web] [Slides]
- Curvature-Exploiting Acceleration of Elastic Net Computations [Web] [Slides]
- Learning a Prior over Intent via Meta-Inverse Reinforcement Learning [Web] [Slides]
- Switching Linear Dynamics for Variational Bayes Filtering [Web] [Slides]
- AUCµ: A Performance Metric for Multi-Class Machine Learning Models [Web] [Slides]
- Learning to Convolve: A Generalized Weight-Tying Approach [Web] [Slides]
- Non-Parametric Priors For Generative Adversarial Networks [Web] [Slides]
- Curiosity-Bottleneck: Exploration By Distilling Task-Specific Novelty [Web] [Slides]
- A Kernel Theory of Modern Data Augmentation [Web] [Slides]
- Homomorphic Sensing [Web] [Slides]
- Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication [Web] [Slides]
- DeepMDP: Learning Continuous Latent Space Models for Representation Learning [Web] [Slides]
- Imputing Missing Events in Continuous-Time Event Streams [Web] [Slides]
- Regularization in directable environments with application to Tetris [Web] [Slides]
- On Dropout and Nuclear Norm Regularization [Web] [Slides]
- Lipschitz Generative Adversarial Nets [Web] [Slides]
- Dynamic Weights in Multi-Objective Deep Reinforcement Learning [Web] [Slides]
- kernelPSI: a Post-Selection Inference Framework for Nonlinear Variable Selection [Web] [Slides]
- Phaseless PCA: Low-Rank Matrix Recovery from Column-wise Phaseless Measurements [Web] [Slides]
- Safe Grid Search with Optimal Complexity [Web] [Slides]
- Importance Sampling Policy Evaluation with an Estimated Behavior Policy [Web] [Slides]
- Understanding and Controlling Memory in Recurrent Neural Networks [Web] [Slides]
- Improved Dynamic Graph Learning through Fault-Tolerant Sparsification [Web] [Slides]
- Gradient Descent Finds Global Minima of Deep Neural Networks [Web] [Slides]
- HexaGAN: Generative Adversarial Nets for Real World Classification [Web] [Slides]
- Fingerprint Policy Optimisation for Robust Reinforcement Learning [Web] [Slides]
- Scalable Learning in Reproducing Kernel Krein Spaces [Web] [Slides]
- Rate Distortion For Model Compression:From Theory To Practice [Web] [Slides]
- SAGA with Arbitrary Sampling [Web] [Slides]
- Learning from a Learner [Web] [Slides]
- Recurrent Kalman Networks: Factorized Inference in High-Dimensional Deep Feature Spaces [Web] [Slides]
- Heterogeneous Model Reuse via Optimizing Multiparty Multiclass Margin [Web] [Slides]
- Composable Core-sets for Determinant Maximization: A Simple Near-Optimal Algorithm [Web] [Slides]
- Graph Matching Networks for Learning the Similarity of Graph Structured Objects [Web] [Slides]
- An Investigation into Neural Net Optimization via Hessian Eigenvalue Density [Web] [Slides]
- Dirichlet Simplex Nest and Geometric Inference [Web] [Slides]
- Formal Privacy for Functional Data with Gaussian Perturbations [Web] [Slides]
- Natural Analysts in Adaptive Data Analysis [Web] [Slides]
- Separable value functions across time-scales [Web]
- Subspace Robust Wasserstein Distances [Web]
- Rethinking Lossy Compression: The Rate-Distortion-Perception Tradeoff [Web]
- Sublinear Time Nearest Neighbor Search over Generalized Weighted Space [Web] [Slides]
- BayesNAS: A Bayesian Approach for Neural Architecture Search [Web] [Slides]
- Differentiable Linearized ADMM [Web] [Slides]
- Bayesian leave-one-out cross-validation for large data [Web] [Slides]
- Graphical-model based estimation and inference for differential privacy [Web] [Slides]
- CapsAndRuns: An Improved Method for Approximately Optimal Algorithm Configuration [Web] [Slides]
- Learning Action Representations for Reinforcement Learning [Web] [Slides]
- Decomposing feature-level variation with Covariate Gaussian Process Latent Variable Models [Web] [Slides]
- Collaborative Channel Pruning for Deep Networks [Web] [Slides]
- Compressing Gradient Optimizers via Count-Sketches [Web] [Slides]
- Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks [Web] [Slides]
- Adaptive Stochastic Natural Gradient Method for One-Shot Neural Architecture Search [Web] [Slides]
- Rao-Blackwellized Stochastic Gradients for Discrete Distributions [Web] [Slides]
- White-box vs Black-box: Bayes Optimal Strategies for Membership Inference [Web] [Slides]
- Leveraging Low-Rank Relations Between Surrogate Tasks in Structured Prediction [Web] [Slides]
- Bayesian Counterfactual Risk Minimization [Web] [Slides]
- Active Manifolds: A non-linear analogue to Active Subspaces [Web] [Slides]
- Same, Same But Different: Recovering Neural Network Quantization Error Through Weight Factorization [Web] [Slides]
- Scalable Fair Clustering [Web] [Slides]
- Shallow-Deep Networks: Understanding and Mitigating Network Overthinking [Web] [Slides]
- A Quantitative Analysis of the Effect of Batch Normalization on Gradient Descent [Web] [Slides]
- Neurally-Guided Structure Inference [Web] [Slides]
- An Optimal Private Stochastic-MAB Algorithm based on Optimal Private Stopping Rule [Web] [Slides]
- Training Well-Generalizing Classifiers for Fairness Metrics and Other Data-Dependent Constraints [Web] [Slides]
- Per-Decision Option Discounting [Web] [Slides]
- Optimal Minimal Margin Maximization with Boosting [Web] [Slides]
- GDPP: Learning Diverse Generations using Determinantal Point Processes [Web] [Slides]
- Conditional Gradient Methods via Stochastic Path-Integrated Differential Estimator [Web] [Slides]
- Graph U-Nets [Web] [Slides]
- The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study [Web] [Slides]
- Bayesian Joint Spike-and-Slab Graphical Lasso [Web] [Slides]
- Sublinear Space Private Algorithms Under the Sliding Window Model [Web] [Slides]
- Optimality Implies Kernel Sum Classifiers are Statistically Efficient [Web] [Slides]
- Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds [Web] [Slides]
- Generalized Linear Rule Models [Web] [Slides]
- Co-Representation Network for Generalized Zero-Shot Learning [Web] [Slides]
- Fault Tolerance in Iterative-Convergent Machine Learning [Web]
- SATNet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver [Web] [Slides]
- AdaGrad stepsizes: sharp convergence over nonconvex landscapes [Web] [Slides]
- Rotation Invariant Householder Parameterization for Bayesian PCA [Web] [Slides]
- Locally Private Bayesian Inference for Count Models [Web]
- The Implicit Fairness Criterion of Unconstrained Learning [Web]
- A Theory of Regularized Markov Decision Processes [Web] [Slides]
- Fast Incremental von Neumann Graph Entropy Computation: Theory, Algorithm, and Applications [Web] [Slides]
- GEOMetrics: Exploiting Geometric Structure for Graph-Encoded Objects [Web] [Slides]
- Static Automatic Batching In TensorFlow [Web] [Slides]
- Area Attention [Web] [Slides]
- Beyond Backprop: Online Alternating Minimization with Auxiliary Variables [Web] [Slides]
- A Framework for Bayesian Optimization in Embedded Subspaces [Web] [Slides]
- Low Latency Privacy Preserving Inference [Web] [Slides]
- Weak Detection of Signal in the Spiked Wigner Model [Web] [Slides]
- Discovering Options for Exploration by Minimizing Cover Time [Web] [Slides]
- Variational Inference for sparse network reconstruction from count data [Web] [Slides]
- EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks [Web] [Slides]
- Improving Neural Network Quantization without Retraining using Outlier Channel Splitting [Web] [Slides]
- The Evolved Transformer [Web] [Slides]
- SWALP : Stochastic Weight Averaging in Low Precision Training [Web] [Slides]
- Convolutional Poisson Gamma Belief Network [Web] [Slides]
- Communication Complexity in Locally Private Distribution Estimation and Heavy Hitters [Web] [Slides]
- Rademacher Complexity for Adversarially Robust Generalization [Web] [Slides]
- Policy Certificates: Towards Accountable Reinforcement Learning [Web] [Slides]
- Simplifying Graph Convolutional Networks [Web] [Slides]
- Geometry Aware Convolutional Filters for Omnidirectional Images Representation [Web] [Slides]
- Memory-Optimal Direct Convolutions for Maximizing Classification Accuracy in Embedded Applications [Web] [Slides]
- Jumpout : Improved Dropout for Deep Neural Networks with ReLUs [Web] [Slides]
- Efficient optimization of loops and limits with randomized telescoping sums [Web] [Slides]
- Automatic Posterior Transformation for Likelihood-Free Inference [Web] [Slides]
- Poission Subsampled R'enyi Differential Privacy [Web] [Slides]
- Provably efficient RL with Rich Observations via Latent State Decoding [Web] [Slides]
- Action Robust Reinforcement Learning and Applications in Continuous Control [Web] [Slides]
- Robust Influence Maximization for Hyperparametric Models [Web] [Slides]
- A Personalized Affective Memory Model for Improving Emotion Recognition [Web] [Slides]
- DL2: Training and Querying Neural Networks with Logic [Web] [Slides]
- Stochastic Deep Networks [Web] [Slides]
- Self-similar Epochs: Value in arrangement [Web] [Slides]
- Active Learning for Decision-Making from Imbalanced Observational Data [Web] [Slides]
- Benefits and Pitfalls of the Exponential Mechanism with Applications to Hilbert Spaces and Functional PCA [Web] [Slides]
- Information-Theoretic Considerations in Batch Reinforcement Learning [Web] [Slides]
- The Value Function Polytope in Reinforcement Learning [Web] [Slides]
- HyperGAN: A Generative Model for Diverse, Performant Neural Networks [Web] [Slides]
- Temporal Gaussian Mixture Layer for Videos [Web] [Slides]
- Posters Tue [Web]
Day 2
- The U.S. Census Bureau Tries to be a Good Data Steward in the 21st Century [Web]
- Test of Time Award [Web]
- Theoretically Principled Trade-off between Robustness and Accuracy [Web]
- Sum-of-Squares Polynomial Flow [Web]
- Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning [Web]
- Distribution calibration for regression [Web]
- On the Convergence and Robustness of Adversarial Training [Web] [Slides]
- Distributed Learning with Sublinear Communication [Web]
- Complexity of Linear Regions in Deep Networks [Web]
- Exploiting Worker Correlation for Label Aggregation in Crowdsourcing [Web]
- Optimal Algorithms for Lipschitz Bandits with Heavy-tailed Rewards [Web]
- The Odds are Odd: A Statistical Test for Detecting Adversarial Examples [Web] [Slides]
- FloWaveNet : A Generative Flow for Raw Audio [Web] [Slides]
- Maximum Entropy-Regularized Multi-Goal Reinforcement Learning [Web] [Slides]
- Graph Convolutional Gaussian Processes [Web] [Slides]
- Learning with Bad Training Data via Iterative Trimmed Loss Minimization [Web] [Slides]
- On the Linear Speedup Analysis of Communication Efficient Momentum SGD for Distributed Non-Convex Optimization [Web] [Slides]
- On Connected Sublevel Sets in Deep Learning [Web] [Slides]
- Efficient Amortised Bayesian Inference for Hierarchical and Nonlinear Dynamical Systems [Web] [Slides]
- Target Tracking for Contextual Bandits: Application to Demand Side Management [Web] [Slides][Slides]
- ME-Net: Towards Effective Adversarial Robustness with Matrix Estimation [Web] [Slides]
- Are Generative Classifiers More Robust to Adversarial Attacks? [Web] [Slides]
- Imitating Latent Policies from Observation [Web] [Slides]
- Asynchronous Batch Bayesian Optimisation with Improved Local Penalisation [Web] [Slides]
- On discriminative learning of prediction uncertainty [Web] [Slides]
- Stochastic Gradient Push for Distributed Deep Learning [Web] [Slides]
- Adversarial Examples Are a Natural Consequence of Test Error in Noise [Web] [Slides]
- A Multitask Multiple Kernel Learning Algorithm for Survival Analysis with Application to Cancer Biology [Web] [Slides]
- Correlated bandits or: How to minimize mean-squared error online [Web] [Slides]
- Certified Adversarial Robustness via Randomized Smoothing [Web] [Slides]
- A Gradual, Semi-Discrete Approach to Generative Network Training via Explicit Wasserstein Minimization [Web] [Slides]
- SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning [Web] [Slides]
- GOODE: A Gaussian Off-The-Shelf Ordinary Differential Equation Solver [Web] [Slides]
- Understanding and Utilizing Deep Neural Networks Trained with Noisy Labels [Web] [Slides]
- Collective Model Fusion for Multiple Black-Box Experts [Web] [Slides]
- Greedy Layerwise Learning Can Scale To ImageNet [Web] [Slides]
- Fast and Flexible Inference of Joint Distributions from their Marginals [Web] [Slides]
- Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging [Web] [Slides]
- Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition [Web] [Slides]
- Disentangling Disentanglement in Variational Autoencoders [Web] [Slides]
- Dimension-Wise Importance Sampling Weight Clipping for Sample-Efficient Reinforcement Learning [Web] [Slides]
- Overcoming Mean-Field Approximations in Recurrent Gaussian Process Models [Web] [Slides]
- Does Data Augmentation Lead to Positive Margin? [Web] [Slides]
- Trading Redundancy for Communication: Speeding up Distributed SGD for Non-convex Optimization [Web] [Slides]
- On the Impact of the Activation function on Deep Neural Networks Training [Web] [Slides]
- Cognitive model priors for predicting human decisions [Web] [Slides]
- Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits [Web] [Slides]
- Parsimonious Black-Box Adversarial Attacks via Efficient Combinatorial Optimization [Web]
- EDDI: Efficient Dynamic Discovery of High-Value Information with Partial VAE [Web]
- Structured agents for physical construction [Web]
- AReS and MaRS - Adversarial and MMD-Minimizing Regression for SDEs [Web]
- Robust Learning from Untrusted Sources [Web] [Slides]
- Trimming the
$\ell_1$ Regularizer: Statistical Analysis, Optimization, and Applications to Deep Learning [Web] [Slides] - Estimating Information Flow in Deep Neural Networks [Web] [Slides]
- Conditioning by adaptive sampling for robust design [Web] [Slides]
- Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously [Web] [Slides]
- Wasserstein Adversarial Examples via Projected Sinkhorn Iterations [Web] [Slides]
- A Wrapped Normal Distribution on Hyperbolic Space for Gradient-Based Learning [Web] [Slides]
- Learning Novel Policies For Tasks [Web] [Slides]
- End-to-End Probabilistic Inference for Nonstationary Audio Analysis [Web] [Slides]
- SELFIE: Refurbishing Unclean Samples for Robust Deep Learning [Web] [Slides]
- Compressed Factorization: Fast and Accurate Low-Rank Factorization of Compressively-Sensed Data [Web] [Slides]
- The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects [Web] [Slides]
- Direct Uncertainty Prediction for Medical Second Opinions [Web] [Slides]
- Bilinear Bandits with Low-rank Structure [Web] [Slides]
- Transferable Clean-Label Poisoning Attacks on Deep Neural Nets [Web] [Slides]
- Emerging Convolutions for Generative Normalizing Flows [Web] [Slides]
- Taming MAML: Efficient unbiased meta-reinforcement learning [Web] [Slides]
- Deep Gaussian Processes with Importance-Weighted Variational Inference [Web] [Slides]
- Zeno: Distributed Stochastic Gradient Descent with Suspicion-based Fault-tolerance [Web] [Slides]
- Noisy Dual Principal Component Pursuit [Web] [Slides]
- Characterizing Well-Behaved vs. Pathological Deep Neural Networks [Web] [Slides]
- Dynamic Measurement Scheduling for Event Forecasting using Deep RL [Web] [Slides]
- Online Learning to Rank with Features [Web] [Slides]
- NATTACK: Learning the Distributions of Adversarial Examples for an Improved Black-Box Attack on Deep Neural Networks [Web] [Slides]
- A Large-Scale Study on Regularization and Normalization in GANs [Web] [Slides]
- Self-Supervised Exploration via Disagreement [Web] [Slides]
- Automated Model Selection with Bayesian Quadrature [Web] [Slides]
- Concentration Inequalities for Conditional Value at Risk [Web] [Slides]
- Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling [Web] [Slides]
- Understanding Geometry of Encoder-Decoder CNNs [Web] [Slides]
- Parameter efficient training of deep convolutional neural networks by dynamic sparse reparameterization [Web] [Slides]
- On the Design of Estimators for Bandit Off-Policy Evaluation [Web] [Slides]
- Simple Black-box Adversarial Attacks [Web] [Slides]
- Variational Annealing of GANs: A Langevin Perspective [Web] [Slides]
- Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables [Web] [Slides]
- [Web]
- Data Poisoning Attacks in Multi-Party Learning [Web] [Slides]
- Screening rules for Lasso with non-convex Sparse Regularizers [Web] [Slides]
- Traditional and Heavy Tailed Self Regularization in Neural Network Models [Web] [Slides]
- DeepNose: Using artificial neural networks to represent the space of odorants [Web] [Slides]
- Dynamic Learning with Frequent New Product Launches: A Sequential Multinomial Logit Bandit Problem [Web] [Slides]
- Causal Identification under Markov Equivalence: Completeness Results [Web]
- Invertible Residual Networks [Web]
- The Natural Language of Actions [Web] [Slides]
- Beyond the Chinese Restaurant and Pitman-Yor processes: Statistical Models with double power-law behavior [Web]
- Distributed Weighted Matching via Randomized Composable Coresets [Web] [Slides]
- Monge blunts Bayes: Hardness Results for Adversarial Training [Web] [Slides]
- Almost surely constrained convex optimization [Web] [Slides]
- Domain Agnostic Learning with Disentangled Representations [Web]
- Context-Aware Zero-Shot Learning for Object Recognition [Web]
- Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models [Web] [Slides]
- NAS-Bench-101: Towards Reproducible Neural Architecture Search [Web] [Slides]
- Control Regularization for Reduced Variance Reinforcement Learning [Web] [Slides]
- DP-GP-LVM: A Bayesian Non-Parametric Model for Learning Multivariate Dependency Structures [Web] [Slides]
- Multivariate Submodular Optimization [Web] [Slides]
- Better generalization with less data using robust gradient descent [Web] [Slides]
- Generalized Majorization-Minimization [Web] [Slides]
- Composing Value Functions in Reinforcement Learning [Web] [Slides][Slides]
- Band-limited Training and Inference for Convolutional Neural Networks [Web] [Slides]
- Causal Discovery and Forecasting in Nonstationary Environments with State-Space Models [Web] [Slides]
- Approximated Oracle Filter Pruning for Destructive CNN Width Optimization [Web] [Slides]
- On the Generalization Gap in Reparameterizable Reinforcement Learning [Web] [Slides]
- Random Function Priors for Correlation Modeling [Web] [Slides]
- Beyond Adaptive Submodularity: Approximation Guarantees of Greedy Policy with Adaptive Submodularity Ratio [Web] [Slides]
- Near optimal finite time identification of arbitrary linear dynamical systems [Web] [Slides]
- On the Computation and Communication Complexity of Parallel SGD with Dynamic Batch Sizes for Stochastic Non-Convex Optimization [Web] [Slides]
- Fast Context Adaptation via Meta-Learning [Web] [Slides]
- Learning Classifiers for Target Domain with Limited or No Labels [Web] [Slides]
- Classifying Treatment Responders Under Causal Effect Monotonicity [Web] [Slides]
- LegoNet: Efficient Convolutional Neural Networks with Lego Filters [Web] [Slides]
- Trajectory-Based Off-Policy Deep Reinforcement Learning [Web] [Slides]
- Variational Russian Roulette for Deep Bayesian Nonparametrics [Web] [Slides]
- Approximating Orthogonal Matrices with Effective Givens Factorization [Web] [Slides]
- Lossless or Quantized Boosting with Integer Arithmetic [Web] [Slides]
- Simple Stochastic Gradient Methods for Non-Smooth Non-Convex Regularized Optimization [Web] [Slides]
- Provable Guarantees for Gradient-Based Meta-Learning [Web] [Slides]
- Population Based Augmentation: Efficient Learning of Augmentation Policy Schedules [Web] [Slides]
- Learning Models from Data with Measurement Error: Tackling Underreporting [Web] [Slides]
- Sorting Out Lipschitz Function Approximation [Web] [Slides]
- A Deep Reinforcement Learning Perspective on Internet Congestion Control [Web] [Slides]
- Incorporating Grouping Information into Bayesian Decision Tree Ensembles [Web] [Slides]
- New results on information theoretic clustering [Web] [Slides]
- Orthogonal Random Forest for Causal Inference [Web] [Slides]
- Surrogate Losses for Online Learning of Stepsizes in Stochastic Non-Convex Optimization [Web] [Slides]
- Towards Understanding Knowledge Distillation [Web] [Slides]
- Anomaly Detection With Multiple-Hypotheses Predictions [Web] [Slides]
- Adjustment Criteria for Generalizing Experimental Findings [Web] [Slides]
- Graph Element Networks: adaptive, structured computation and memory [Web] [Slides]
- Model-Based Active Exploration [Web]
- Variational Implicit Processes [Web]
- Improved Parallel Algorithms for Density-Based Network Clustering [Web] [Slides]
- MONK -- Outlier-Robust Mean Embedding Estimation by Median-of-Means [Web]
- Efficient Dictionary Learning with Gradient Descent [Web]
- Transferable Adversarial Training: A General Approach to Adapting Deep Classifiers [Web] [Slides]
- Kernel Mean Matching for Content Addressability of GANs [Web]
- Conditional Independence in Testing Bayesian Networks [Web] [Slides]
- Training CNNs with Selective Allocation of Channels [Web] [Slides]
- Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations [Web] [Slides]
- Discovering Latent Covariance Structures for Multiple Time Series [Web] [Slides]
- Submodular Observation Selection and Information Gathering for Quadratic Models [Web] [Slides]
- The advantages of multiple classes for reducing overfitting from test set reuse [Web] [Slides]
- Plug-and-Play Methods Provably Converge with Properly Trained Denoisers [Web] [Slides]
- Transferability vs. Discriminability: Batch Spectral Penalization for Adversarial Domain Adaptation [Web] [Slides]
- Neural Inverse Knitting: From Images to Manufacturing Instructions [Web] [Slides]
- Sensitivity Analysis of Linear Structural Causal Models [Web] [Slides]
- Equivariant Transformer Networks [Web] [Slides]
- Distributional Multivariate Policy Evaluation and Exploration with the Bellman GAN [Web] [Slides][Slides]
- Scalable Training of Inference Networks for Gaussian-Process Models [Web] [Slides]
- Submodular Cost Submodular Cover with an Approximate Oracle [Web] [Slides]
- On the statistical rate of nonlinear recovery in generative models with heavy-tailed data [Web] [Slides]
- Riemannian adaptive stochastic gradient algorithms on matrix manifolds [Web] [Slides]
- Learning-to-Learn Stochastic Gradient Descent with Biased Regularization [Web] [Slides]
- Making Convolutional Networks Shift-Invariant Again [Web] [Slides]
- More Efficient Off-Policy Evaluation through Regularized Targeted Learning [Web] [Slides]
- Overcoming Multi-model Forgetting [Web] [Slides]
- A Baseline for Any Order Gradient Estimation in Stochastic Computation Graphs [Web] [Slides]
- Bayesian Optimization Meets Bayesian Optimal Stopping [Web] [Slides]
- Submodular Streaming in All Its Glory: Tight Approximation, Minimum Memory and Low Adaptive Complexity [Web] [Slides]
- Phase transition in PCA with missing data: Reduced signal-to-noise ratio, not sample size! [Web] [Slides]
- Stochastic Optimization for DC Functions and Non-smooth Non-convex Regularizers with Non-asymptotic Convergence [Web] [Slides]
- BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning [Web] [Slides]
- Generative Modeling of Infinite Occluded Objects for Compositional Scene Representation [Web] [Slides]
- Inferring Heterogeneous Causal Effects in Presence of Spatial Confounding [Web] [Slides]
- Bayesian Nonparametric Federated Learning of Neural Networks [Web] [Slides]
- Remember and Forget for Experience Replay [Web] [Slides]
- Learning interpretable continuous-time models of latent stochastic dynamical systems [Web] [Slides]
- Hiring Under Uncertainty [Web] [Slides]
- On Medians of (Randomized) Pairwise Means [Web] [Slides]
- Alternating Minimizations Converge to Second-Order Optimal Solutions [Web] [Slides]
- Towards Accurate Model Selection in Deep Unsupervised Domain Adaptation [Web] [Slides][Slides]
- IMEXnet - A Forward Stable Deep Neural Network [Web] [Slides]
- Adversarially Learned Representations for Information Obfuscation and Inference [Web] [Slides]
- How does Disagreement Help Generalization against Label Corruption? [Web] [Slides]
- Tensor Variable Elimination for Plated Factor Graphs [Web] [Slides]
- A Tree-Based Method for Fast Repeated Sampling of Determinantal Point Processes [Web]
- Position-aware Graph Neural Networks [Web]
- Accelerated Linear Convergence of Stochastic Momentum Methods in Wasserstein Distances [Web]
- Provably Efficient Imitation Learning from Observation Alone [Web]
- Active Embedding Search via Noisy Paired Comparisons [Web] [Slides]
- Do ImageNet Classifiers Generalize to ImageNet? [Web]
- Adaptive Neural Trees [Web] [Slides]
- EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis [Web] [Slides]
- Predicate Exchange: Inference with Declarative Knowledge [Web] [Slides]
- Nonlinear Stein Variational Gradient Descent for Learning Diversified Mixture Models [Web] [Slides]
- Detecting Overlapping and Correlated Communities without Pure Nodes: Identifiability and Algorithm [Web] [Slides]
- SGD without Replacement: Sharper Rates for General Smooth Convex Functions [Web] [Slides]
- Dead-ends and Secure Exploration in Reinforcement Learning [Web] [Slides]
- Fast Direct Search in an Optimally Compressed Continuous Target Space for Efficient Multi-Label Active Learning [Web] [Slides]
- Exploring the Landscape of Spatial Robustness [Web] [Slides]
- Connectivity-Optimized Representation Learning via Persistent Homology [Web] [Slides]
- Addressing the Loss-Metric Mismatch with Adaptive Loss Alignment [Web] [Slides]
- Discriminative Regularization for Latent Variable Models with Applications to Electrocardiography [Web] [Slides]
- Understanding and Accelerating Particle-Based Variational Inference [Web] [Slides]
- Learning Generative Models across Incomparable Spaces [Web] [Slides]
- On the Complexity of Approximating Wasserstein Barycenters [Web] [Slides]
- Statistics and Samples in Distributional Reinforcement Learning [Web] [Slides]
- Myopic Posterior Sampling for Adaptive Goal Oriented Design of Experiments [Web] [Slides]
- Sever: A Robust Meta-Algorithm for Stochastic Optimization [Web] [Slides]
- Minimal Achievable Sufficient Statistic Learning [Web] [Slides]
- Deep Compressed Sensing [Web] [Slides]
- Hierarchical Decompositional Mixtures of Variational Autoencoders [Web] [Slides]
- Efficient learning of smooth probability functions from Bernoulli tests with guarantees [Web] [Slides]
- Relational Pooling for Graph Representations [Web] [Slides]
- Estimate Sequences for Variance-Reduced Stochastic Composite Optimization [Web] [Slides]
- Hessian Aided Policy Gradient [Web] [Slides]
- Bayesian Generative Active Deep Learning [Web] [Slides]
- Analyzing Federated Learning through an Adversarial Lens [Web] [Slides]
- Learning to Route in Similarity Graphs [Web] [Slides]
- Differentiable Dynamic Normalization for Learning Deep Representation [Web] [Slides]
- Finding Mixed Nash Equilibria of Generative Adversarial Networks [Web] [Slides]
- The Variational Predictive Natural Gradient [Web] [Slides]
- Disentangled Graph Convolutional Networks [Web] [Slides]
- A Dynamical Systems Perspective on Nesterov Acceleration [Web] [Slides]
- Provably Efficient Maximum Entropy Exploration [Web] [Slides]
- Active Learning for Probabilistic Structured Prediction of Cuts and Matchings [Web] [Slides]
- Fairwashing: the risk of rationalization [Web] [Slides]
- Invariant-Equivariant Representation Learning for Multi-Class Data [Web] [Slides]
- Toward Understanding the Importance of Noise in Training Neural Networks [Web] [Slides]
- CompILE: Compositional Imitation Learning and Execution [Web]
- Scalable Nonparametric Sampling from Multimodal Posteriors with the Posterior Bootstrap [Web] [Slides]
- Open Vocabulary Learning on Source Code with a Graph-Structured Cache [Web] [Slides]
- Random Shuffling Beats SGD after Finite Epochs [Web] [Slides]
- Combining parametric and nonparametric models for off-policy evaluation [Web] [Slides]
- Active Learning with Disagreement Graphs [Web] [Slides]
- Understanding the Origins of Bias in Word Embeddings [Web]
- Infinite Mixture Prototypes for Few-shot Learning [Web] [Slides]
- Cheap Orthogonal Constraints in Neural Networks: A Simple Parametrization of the Orthogonal and Unitary Group [Web] [Slides]
- Sparse Multi-Channel Variational Autoencoder for the Joint Analysis of Heterogeneous Data [Web] [Slides]
- An Instability in Variational Inference for Topic Models [Web] [Slides]
- Learning Discrete Structures for Graph Neural Networks [Web] [Slides]
- First-Order Algorithms Converge Faster than
$O(1/k)$ on Convex Problems [Web] [Slides] - Sample-Optimal Parametric Q-Learning Using Linearly Additive Features [Web] [Slides]
- Multi-Frequency Vector Diffusion Maps [Web] [Slides]
- Bias Also Matters: Bias Attribution for Deep Neural Network Explanation [Web] [Slides]
- MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing [Web] [Slides]
- Breaking Inter-Layer Co-Adaptation by Classifier Anonymization [Web] [Slides][Slides]
- Deep Generative Learning via Variational Gradient Flow [Web] [Slides]
- Bayesian Optimization of Composite Functions [Web] [Slides]
- Compositional Fairness Constraints for Graph Embeddings [Web] [Slides]
- Improved Convergence for
$\ell_1$ and$\ell_\infty$ Regression via Iteratively Reweighted Least Squares [Web] [Slides] - Transfer of Samples in Policy Search via Multiple Importance Sampling [Web] [Slides]
- Co-manifold learning with missing data [Web] [Slides]
- Interpreting Adversarially Trained Convolutional Neural Networks [Web] [Slides]
- Learn to Grow: A Continual Structure Learning Framework for Overcoming Catastrophic Forgetting [Web] [Slides]
- Understanding the Impact of Entropy on Policy Optimization [Web] [Slides]
- Flow++: Improving Flow-Based Generative Models with Variational Dequantization and Architecture Design [Web] [Slides]
- The Kernel Interaction Trick: Fast Bayesian Discovery of Pairwise Interactions in High Dimensions [Web] [Slides]
- A Recurrent Neural Cascade-based Model for Continuous-Time Diffusion [Web] [Slides]
- Optimal Mini-Batch and Step Sizes for SAGA [Web] [Slides]
- Exploration Conscious Reinforcement Learning Revisited [Web] [Slides]
- [Web]
- Counterfactual Visual Explanations [Web] [Slides]
- [Web]
- Probability Functional Descent: A Unifying Perspective on GANs, Variational Inference, and Reinforcement Learning [Web] [Slides]
- Learning Neurosymbolic Generative Models via Program Synthesis [Web] [Slides]
- Quantile Stein Variational Gradient Descent for Batch Bayesian Optimization [Web] [Slides]
- Stochastic Blockmodels meet Graph Neural Networks [Web] [Slides]
- Differential Inclusions for Modeling Nonsmooth ADMM Variants: A Continuous Limit Theory [Web] [Slides]
- Kernel-Based Reinforcement Learning in Robust Markov Decision Processes [Web] [Slides]
- [Web]
- Data Poisoning Attacks on Stochastic Bandits [Web] [Slides]
- Posters Wed [Web]
Day 3
- Neural Network Attributions: A Causal Perspective [Web]
- State-Reification Networks: Improving Generalization by Modeling the Distribution of Hidden Representations [Web] [Slides]
- Batch Policy Learning under Constraints [Web] [Slides]
- Sliced-Wasserstein Flows: Nonparametric Generative Modeling via Optimal Transport and Diffusions [Web]
- Matrix-Free Preconditioning in Online Learning [Web] [Slides]
- Geometric Losses for Distributional Learning [Web] [Slides]
- Defending Against Saddle Point Attack in Byzantine-Robust Distributed Learning [Web]
- Doubly Robust Joint Learning for Recommendation on Data Missing Not at Random [Web] [Slides]
- On Sparse Linear Regression in the Local Differential Privacy Model [Web]
- Towards a Deep and Unified Understanding of Deep Neural Models in NLP [Web] [Slides]
- Variational Laplace Autoencoders [Web] [Slides]
- Quantifying Generalization in Reinforcement Learning [Web] [Slides]
- Non-Asymptotic Analysis of Fractional Langevin Monte Carlo for Non-Convex Optimization [Web] [Slides]
- Online Convex Optimization in Adversarial Markov Decision Processes [Web] [Slides][Slides]
- Classification from Positive, Unlabeled and Biased Negative Data [Web] [Slides]
- Stochastic Iterative Hard Thresholding for Graph-structured Sparsity Optimization [Web] [Slides]
- Linear-Complexity Data-Parallel Earth Mover's Distance Approximations [Web] [Slides]
- Differentially Private Empirical Risk Minimization with Non-convex Loss Functions [Web] [Slides]
- Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Value Approximation [Web] [Slides]
- Latent Normalizing Flows for Discrete Sequences [Web] [Slides]
- Learning Latent Dynamics for Planning from Pixels [Web] [Slides]
- Unifying Orthogonal Monte Carlo Methods [Web] [Slides]
- Competing Against Nash Equilibria in Adversarially Changing Zero-Sum Games [Web] [Slides]
- Complementary-Label Learning for Arbitrary Losses and Models [Web] [Slides]
- Neuron birth-death dynamics accelerates gradient descent and converges asymptotically [Web] [Slides]
- Model Comparison for Semantic Grouping [Web] [Slides]
- Bounding User Contributions: A Bias-Variance Trade-off in Differential Privacy [Web] [Slides]
- Functional Transparency for Structured Data: a Game-Theoretic Approach [Web] [Slides]
- Multi-objective training of Generative Adversarial Networks with multiple discriminators [Web] [Slides]
- Projections for Approximate Policy Iteration Algorithms [Web] [Slides]
- Adaptive Monte Carlo Multiple Testing via Multi-Armed Bandits [Web] [Slides]
- Online Learning with Sleeping Experts and Feedback Graphs [Web] [Slides]
- Learning to Infer Program Sketches [Web] [Slides]
- Width Provably Matters in Optimization for Deep Linear Neural Networks [Web] [Slides]
- RaFM: Rank-Aware Factorization Machines [Web] [Slides]
- Differentially Private Learning of Geometric Concepts [Web] [Slides]
- Exploring interpretable LSTM neural networks over multi-variable data [Web] [Slides]
- Learning Discrete and Continuous Factors of Data via Alternating Disentanglement [Web] [Slides]
- Learning Structured Decision Problems with Unawareness [Web] [Slides]
- Metropolis-Hastings Generative Adversarial Networks [Web] [Slides]
- Incremental Randomized Sketching for Online Kernel Learning [Web] [Slides]
- Hierarchically Structured Meta-learning [Web] [Slides]
- Overparameterized Nonlinear Learning: Gradient Descent Takes the Shortest Path? [Web] [Slides]
- CAB: Continuous Adaptive Blending for Policy Evaluation and Learning [Web] [Slides]
- Toward Controlling Discrimination in Online Ad Auctions [Web] [Slides]
- TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing [Web]
- Bit-Swap: Recursive Bits-Back Coding for Lossless Compression with Hierarchical Latent Variables [Web]
- Calibrated Model-Based Deep Reinforcement Learning [Web] [Slides]
- Scalable Metropolis-Hastings for Exact Bayesian Inference with Large Datasets [Web] [Slides]
- Adaptive Scale-Invariant Online Algorithms for Learning Linear Models [Web]
- Bridging Theory and Algorithm for Domain Adaptation [Web] [Slides]
- Power k-Means Clustering [Web] [Slides]
- MetricGAN: Generative Adversarial Networks based Black-box Metric Scores Optimization for Speech Enhancement [Web]
- Learning Optimal Fair Policies [Web]
- Gaining Free or Low-Cost Interpretability with Interpretable Partial Substitute [Web] [Slides]
- Graphite: Iterative Generative Modeling of Graphs [Web] [Slides]
- Reinforcement Learning in Configurable Continuous Environments [Web] [Slides]
- Replica Conditional Sequential Monte Carlo [Web] [Slides]
- Online Control with Adversarial Disturbances [Web] [Slides]
- Transfer Learning for Related Reinforcement Learning Tasks via Image-to-Image Translation [Web] [Slides]
- Distributed Learning over Unreliable Networks [Web] [Slides]
- Neural Separation of Observed and Unobserved Distributions [Web] [Slides]
- Fairness-Aware Learning for Continuous Attributes and Treatments [Web] [Slides]
- State-Regularized Recurrent Neural Networks [Web] [Slides]
- Hybrid Models with Deep and Invertible Features [Web] [Slides]
- Target-Based Temporal-Difference Learning [Web] [Slides]
- A Polynomial Time MCMC Method for Sampling from Continuous Determinantal Point Processes [Web] [Slides]
- Adversarial Online Learning with noise [Web] [Slides]
- Learning What and Where to Transfer [Web] [Slides]
- Escaping Saddle Points with Adaptive Gradient Methods [Web] [Slides]
- Almost Unsupervised Text to Speech and Automatic Speech Recognition [Web] [Slides]
- Fairness risk measures [Web] [Slides]
- Understanding Impacts of High-Order Loss Approximations and Features in Deep Learning Interpretation [Web] [Slides]
- MIWAE: Deep Generative Modelling and Imputation of Incomplete Data Sets [Web] [Slides]
- Iterative Linearized Control: Stable Algorithms and Complexity Guarantees [Web] [Slides]
- Adaptive Antithetic Sampling for Variance Reduction [Web] [Slides]
- Online Variance Reduction with Mixtures [Web] [Slides]
- [Web]
-
$\texttt{DoubleSqueeze}$ : Parallel Stochastic Gradient Descent with Double-pass Error-Compensated Compression [Web] [Slides] - AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss [Web] [Slides]
- [Web]
- On the Connection Between Adversarial Robustness and Saliency Map Interpretability [Web] [Slides]
- On Scalable and Efficient Computation of Large Scale Optimal Transport [Web] [Slides]
- Finding Options that Minimize Planning Time [Web] [Slides]
- Accelerated Flow for Probability Distributions [Web] [Slides]
- Bandit Multiclass Linear Classification: Efficient Algorithms for the Separable Case [Web] [Slides]
- [Web]
- Model Function Based Conditional Gradient Method with Armijo-like Line Search [Web] [Slides]
- A fully differentiable beam search decoder [Web] [Slides]
- [Web]
- Why do Larger Models Generalize Better? A Theoretical Perspective via the XOR Problem [Web]
- Understanding and correcting pathologies in the training of learned optimizers [Web]
- Stochastic Beams and Where To Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement [Web] [Slides]
- Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel
$k$ -means Clustering [Web] [Slides] - Learning Linear-Quadratic Regulators Efficiently with only
$\sqrt{T}$ Regret [Web] [Slides] - DBSCAN++: Towards fast and scalable density clustering [Web]
- Analogies Explained: Towards Understanding Word Embeddings [Web] [Slides]
- Scaling Up Ordinal Embedding: A Landmark Approach [Web] [Slides]
- Proportionally Fair Clustering [Web] [Slides]
- On the Spectral Bias of Neural Networks [Web] [Slides]
- Demystifying Dropout [Web] [Slides]
- Learning to Exploit Long-term Relational Dependencies in Knowledge Graphs [Web] [Slides]
- Dimensionality Reduction for Tukey Regression [Web] [Slides]
- Learning from Delayed Outcomes via Proxies with Applications to Recommender Systems [Web] [Slides]
- Concrete Autoencoders: Differentiable Feature Selection and Reconstruction [Web] [Slides]
- Parameter-Efficient Transfer Learning for NLP [Web] [Slides]
- Learning to select for a predefined ranking [Web] [Slides]
- Stable and Fair Classification [Web] [Slides]
- Recursive Sketches for Modular Deep Learning [Web] [Slides]
- Ladder Capsule Network [Web] [Slides]
- Meta-Learning Neural Bloom Filters [Web] [Slides]
- Efficient Full-Matrix Adaptive Regularization [Web] [Slides]
- Adaptive Regret of Convex and Smooth Functions [Web] [Slides]
- Gromov-Wasserstein Learning for Graph Matching and Node Embedding [Web] [Slides]
- Efficient On-Device Models using Neural Projections [Web] [Slides]
- Mallows ranking models: maximum likelihood estimate and regeneration [Web] [Slides]
- Flexibly Fair Representation Learning by Disentanglement [Web] [Slides]
- Zero-Shot Knowledge Distillation in Deep Networks [Web] [Slides]
- Unreproducible Research is Reproducible [Web] [Slides]
- CoT: Cooperative Training for Generative Modeling of Discrete Data [Web] [Slides]
- Breaking the gridlock in Mixture-of-Experts: Consistent and Efficient Algorithms [Web] [Slides]
- Online Adaptive Principal Component Analysis and Its extensions [Web] [Slides]
- Spectral Clustering of Signed Graphs via Matrix Power Means [Web] [Slides]
- Deep Residual Output Layers for Neural Language Generation [Web] [Slides]
- Fast and Stable Maximum Likelihood Estimation for Incomplete Multinomial Models [Web] [Slides]
- Fair Regression: Quantitative Definitions and Reduction-Based Algorithms [Web] [Slides]
- A Convergence Theory for Deep Learning via Over-Parameterization [Web] [Slides]
- Geometric Scattering for Graph Data Analysis [Web] [Slides]
- Non-Monotonic Sequential Text Generation [Web] [Slides]
- Efficient Nonconvex Regularized Tensor Completion with Structure-aware Proximal Iterations [Web] [Slides]
- POLITEX: Regret Bounds for Policy Iteration using Expert Prediction [Web] [Slides]
- Coresets for Ordered Weighted Clustering [Web] [Slides]
- Improving Neural Language Modeling via Adversarial Training [Web] [Slides]
- Fast Algorithm for Generalized Multinomial Models with Ranking Data [Web] [Slides]
- Fairness without Harm: Decoupled Classifiers with Preference Guarantees [Web] [Slides]
- A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks [Web]
- Robust Inference via Generative Classifiers for Handling Noisy Labels [Web] [Slides]
- Insertion Transformer: Flexible Sequence Generation via Insertion Operations [Web]
- Robust Estimation of Tree Structured Gaussian Graphical Models [Web] [Slides]
- Anytime Online-to-Batch, Optimism and Acceleration [Web] [Slides]
- Fair k-Center Clustering for Data Summarization [Web]
- Mixture Models for Diverse Machine Translation: Tricks of the Trade [Web]
- Graph Resistance and Learning from Pairwise Comparisons [Web] [Slides]
- Differentially Private Fair Learning [Web]
- Approximation and non-parametric estimation of ResNet-type convolutional neural networks [Web] [Slides][Slides]
- LIT: Learned Intermediate Representation Training for Model Compression [Web] [Slides]
- Empirical Analysis of Beam Search Performance Degradation in Neural Sequence Models [Web] [Slides]
- Spectral Approximate Inference [Web] [Slides]
- Cautious Regret Minimization: Online Optimization with Long-Term Budget Constraints [Web] [Slides]
- A Better k-means++ Algorithm via Local Search [Web] [Slides]
- MASS: Masked Sequence to Sequence Pre-training for Language Generation [Web] [Slides]
- Learning Context-dependent Label Permutations for Multi-label Classification [Web] [Slides]
- Obtaining Fairness using Optimal Transport Theory [Web] [Slides]
- Global Convergence of Block Coordinate Descent in Deep Learning [Web] [Slides]
- Analyzing and Improving Representations with the Soft Nearest Neighbor Loss [Web] [Slides]
- Trainable Decoding of Sets of Sequences for Neural Sequence Models [Web] [Slides]
- Partially Linear Additive Gaussian Graphical Models [Web] [Slides]
- Optimal Kronecker-Sum Approximation of Real Time Recurrent Learning [Web] [Slides]
- Kernel Normalized Cut: a Theoretical Revisit [Web] [Slides]
- Humor in Word Embeddings: Cockamamie Gobbledegook for Nincompoops [Web] [Slides]
- Discovering Context Effects from Raw Choice Data [Web] [Slides]
- Repairing without Retraining: Avoiding Disparate Impact with Counterfactual Distributions [Web] [Slides]
- Measurements of Three-Level Hierarchical Structure in the Outliers in the Spectrum of Deepnet Hessians [Web] [Slides]
- What is the Effect of Importance Weighting in Deep Learning? [Web] [Slides]
- Learning to Generalize from Sparse and Underspecified Rewards [Web] [Slides]
- DAG-GNN: DAG Structure Learning with Graph Neural Networks [Web] [Slides]
- Adaptive Sensor Placement for Continuous Spaces [Web] [Slides]
- Guarantees for Spectral Clustering with Fairness Constraints [Web] [Slides]
- MeanSum: A Neural Model for Unsupervised Multi-Document Abstractive Summarization [Web] [Slides]
- On the Feasibility of Learning, Rather than Assuming, Human Biases for Reward Inference [Web] [Slides]
- On the Long-term Impact of Algorithmic Decision Policies: Effort Unfairness and Feature Segregation through Social Learning [Web] [Slides]
- On the Limitations of Representing Functions on Sets [Web] [Slides]
- Similarity of Neural Network Representations Revisited [Web] [Slides]
- Efficient Training of BERT by Progressively Stacking [Web] [Slides]
- Random Walks on Hypergraphs with Edge-Dependent Vertex Weights [Web] [Slides]
- Scale-free adaptive planning for deterministic dynamics & discounted rewards [Web] [Slides]
- Supervised Hierarchical Clustering with Exponential Linkage [Web] [Slides]
- CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network [Web] [Slides]
- Learning Distance for Sequences by Learning a Ground Metric [Web] [Slides]
- Making Decisions that Reduce Discriminatory Impacts [Web] [Slides]
- What 4 year olds can do and AI can’t (yet) [Web]
- Best Paper [Web]
- Probabilistic Neural Symbolic Models for Interpretable Visual Question Answering [Web]
- Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations [Web]
- Decentralized Exploration in Multi-Armed Bandits [Web] [Slides]
- Communication-Constrained Inference and the Role of Shared Randomness [Web] [Slides]
- COMIC: Multi-view Clustering Without Parameter Selection [Web]
- Submodular Maximization beyond Non-negativity: Guarantees, Fast Algorithms, and Applications [Web]
- Nonparametric Bayesian Deep Networks with Local Competition [Web] [Slides][Slides]
- Distributed, Egocentric Representations of Graphs for Detecting Critical Structures [Web] [Slides]
- Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback [Web] [Slides]
- Learning and Data Selection in Big Datasets [Web] [Slides]
- The Wasserstein Transform [Web] [Slides]
- Online Algorithms for Rent-Or-Buy with Expert Advice [Web] [Slides]
- Good Initializations of Variational Bayes for Deep Models [Web] [Slides]
- Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-linearities [Web] [Slides]
- Exploiting structure of uncertainty for efficient matroid semi-bandits [Web] [Slides]
- Sublinear quantum algorithms for training linear and kernel-based classifiers [Web] [Slides]
- Sequential Facility Location: Approximate Submodularity and Greedy Algorithm [Web] [Slides]
- Non-monotone Submodular Maximization with Nearly Optimal Adaptivity and Query Complexity [Web] [Slides]
- Dropout as a Structured Shrinkage Prior [Web] [Slides]
- Multi-Object Representation Learning with Iterative Variational Inference [Web] [Slides]
- PAC Identification of Many Good Arms in Stochastic Multi-Armed Bandits [Web] [Slides]
- Agnostic Federated Learning [Web] [Slides]
- Neural Collaborative Subspace Clustering [Web] [Slides]
- Categorical Feature Compression via Submodular Optimization [Web] [Slides]
- ARSM: Augment-REINFORCE-Swap-Merge Estimator for Gradient Backpropagation Through Categorical Variables [Web] [Slides]
- Cross-Domain 3D Equivariant Image Embeddings [Web] [Slides]
- Contextual Multi-armed Bandit Algorithm for Semiparametric Reward Model [Web] [Slides]
- Discovering Conditionally Salient Features with Statistical Guarantees [Web] [Slides]
- Unsupervised Deep Learning by Neighbourhood Discovery [Web] [Slides]
- Multi-Frequency Phase Synchronization [Web] [Slides]
- On Variational Bounds of Mutual Information [Web]
- Loss Landscapes of Regularized Linear Autoencoders [Web] [Slides]
- Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning [Web]
- A Theoretical Analysis of Contrastive Unsupervised Representation Learning [Web]
- Autoregressive Energy Machines [Web]
- Faster Algorithms for Binary Matrix Factorization [Web]
- Partially Exchangeable Networks and Architectures for Learning Summary Statistics in Approximate Bayesian Computation [Web] [Slides]
- Hyperbolic Disk Embeddings for Directed Acyclic Graphs [Web] [Slides]
- TarMAC: Targeted Multi-Agent Communication [Web] [Slides]
- The information-theoretic value of unlabeled data in semi-supervised learning [Web] [Slides]
- Greedy Orthogonal Pivoting Algorithm for Non-Negative Matrix Factorization [Web] [Slides]
- Tractable n-Metrics for Multiple Graphs [Web] [Slides]
- Hierarchical Importance Weighted Autoencoders [Web] [Slides]
- LatentGNN: Learning Efficient Non-local Relations for Visual Recognition [Web] [Slides]
- QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning [Web] [Slides]
- Unsupervised Label Noise Modeling and Loss Correction [Web] [Slides]
- Noise2Self: Blind Denoising by Self-Supervision [Web] [Slides]
- Guided evolutionary strategies: augmenting random search with surrogate gradients [Web] [Slides]
- Faster Attend-Infer-Repeat with Tractable Probabilistic Models [Web] [Slides]
- Robustly Disentangled Causal Mechanisms: Validating Deep Representations for Interventional Robustness [Web] [Slides]
- Actor-Attention-Critic for Multi-Agent Reinforcement Learning [Web] [Slides]
- Domain Adaptation with Asymmetrically-Relaxed Distribution Alignment [Web] [Slides]
- Learning Dependency Structures for Weak Supervision Models [Web] [Slides]
- Adaptive and Safe Bayesian Optimization in High Dimensions via One-Dimensional Subspaces [Web] [Slides]
- Understanding Priors in Bayesian Neural Networks at the Unit Level [Web] [Slides]
- Lorentzian Distance Learning for Hyperbolic Representations [Web] [Slides]
- Finite-Time Analysis of Distributed TD(0) with Linear Function Approximation on Multi-Agent Reinforcement Learning [Web] [Slides]
- Pareto Optimal Streaming Unsupervised Classification [Web] [Slides]
- Geometry and Symmetry in Short-and-Sparse Deconvolution [Web] [Slides]
- Semi-Cyclic Stochastic Gradient Descent [Web] [Slides]
- Posters Thu [Web]