Epsilon-Lee/ml-math-in-and-for-nlp

An evolving notes on machine learning and mathematical techniques in and for Natural Language Processing.

ml-and-math-in-for-nlp

An evolving notes on machine learning and mathematical techniques in and for Natural Language Processing.

Update - Currently, I am not knowledgeable enough to determine the content to be included in this booklet; so I might include several specific research papers that require certain aspect of math and ml expertise, so as to enumerate the extension of ml and math for nlp. (2022.10.2)

Collection of papers by topic

Statistical Principle Based Learning

density/distribution estimation over structures, generative story

Minimax and Neyman–Pearson Meta-Learning for Outlier Languages, Neyman-Pearson
Annealing Techniques for Unsupervised Statistical Language Learning, deterministic annealing

Statistical Divergence

Probability Divergences and Generative Models, Arthur Gretton's talk mlss2021

Latent Variable Model and its Solution

variational inference, normalizing flow

Learning Opinion Summarizers by Selecting Informative Reviews, Ivan Titov's group. REINFORCE amortized varitional inference
Sequence-to-Sequence Learning with Latent Neural Grammars, Yoon Kim. likelihood bounding
Structured Reordering for Modeling Latent Alignments in Sequence Transduction, Ivan Titov's group. marginalization
Discrete Latent Structure in Neural Networks, Jan. 18 2023. booklet in structure prediction.

Feature/Variable Selection

Rare Feature Selection in High Dimensions, lasso

Sampling and Search Techniques

Determinantal Beam Search, Ryan Cotterell's group. beam search
Mode recovery in neural autoregressive sequence modeling, Kyunghyun Cho's group. sampling
Parallel and Flexible Sampling from Autoregressive Models via Langevin Dynamics
Searching for More Efficient Dynamic Programs, Jason Eisner et al. dp

Sampling from Energy-based Model

Sampling from Discrete Energy-Based Models with Quality/Efficiency Trade-offs, Dec. 10 2021. nips2021

Information Theory

What Context Features Can Transformer Language Models Use?, Jacob Andreas's group. V-information
Conditional probing: measuring usable information beyond a baseline, Percy Liang's group. V-information
On the Complexity and Typology of Inflectional Morphological Systems, Ryan Cotterell. complexity measure
On Homophony and Renyi Entropy, Ryan Cotterell et al. entropy

Geometry

The Low-Dimensional Linear Geometry of Contextualized Word Representations, Jacob Andreas's group.
On Isotropy Calibration of Transformers, ETH's group.

Discrete optimization

submodular, Gumbel-Softmax, optimization for structured prediction

Towards Dynamic Computation Graphs via Sparse Latent Structure, emnlp2018 marginalize.
Differentiable Perturb-and-Parse: Semi-Supervised Parsing with a Structured Variational Autoencoder, iclr2019
Backpropagating through Structured Argmax using a SPIGOT, acl2018
Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions, Oct. 27 2021. nips2021 code
Argmax Flows and Multinomial Diffusion: Learning Categorical Distributions, Oct. 22 2021.
Understanding and Testing Generalization of Deep Networks on Out-of-Distribution Data, nips2021 code
Storchastic: A Framework for General Stochastic Automatic Differentiation, nips2021
Scaling Structured Inference with Randomization, Dec. 7 2021.
Learning with Latent Structures in Natural Language Processing: A Survey, Jan. 3 2022.
Gradient Estimation with Discrete Stein Operators, Feb. 19 2022.

Learning Paradigms

Reinforcement Learning

Learning Natural Language Generation from Scratch, DeepMind.
Batch size-invariance for policy optimization, ppo.