- An Empirical Exploration of Recurrent Network Architectures (Jozefowicz et al) [pdf] [github] [summary]
- Regularizing and Optimizing LSTM Language Models (S.Merity et al) [pdf] [github] [summary]
- Improving Language Modeling using Densely Connected Recurrent Neural Networks (Godin et al) [pdf] [github] [summary]
- Grow and Prune Compact, Fast, and Accurate LSTMs (Dai et al) [pdf] [github] [summary]
- Improving Neural Language Models With a Continuous Cache (2016) (Grave et al) [pdf] [github] [summary]
- An Analysis of Neural Language Modeling at Multiple Scales (S.Merity et al) [pdf] [github] [summary]
- Recurrent Neural Network Regularization (Zaremba et el) [pdf] [github] [summary]
- Regularizing and Optimizing LSTM Language Models (S.Merity et al) [pdf] [github] [summary]
- A Theoretically Grounded Application of Dropout in Recurrent Neural Networks (Gal et al.) pdf[github][summary]
- FREEZEOUT: Accelerate Training By Progressively Freezing Layers [pdf]
- Generalization Through Memorization: Nearest Neighbor Language Models (Khandelwal et al) (2020) [pdf] [github] [summary]
- Efficient Softmax Approximation for GPUs (2016) (Grave et al) [pdf] [github] [summary]
- Adaptive Input Representations For Neural Language Modeling (2019) (Baevski & Auli) [pdf] [github] [summary]
- Tying Word Vectors And Word Classifiers: A Loss Framework For Language Modeling (Inan et al) (2016) [pdf] [github] [summary]
- Using the Output Embedding to Improve Language Models (2017) (Press & Wolf) [pdf] [github] [Summary]
- FRAGE: Frequency-Agnostic Word Representation https://arxiv.org/pdf/1809.06858.pdf
- BREAKING THESOFTMAXBOTTLENECK:A HIGH-RANKRNN LANGUAGEMODEL https://arxiv.org/pdf/1711.03953.pdf
- MOGRIFIERLSTM https://arxiv.org/pdf/1909.01792.pdf
- Pushing the bounds of dropout https://arxiv.org/pdf/1805.09208.pdf
- On The State Of The Art Of Evaluation In Neural Language Models https://arxiv.org/pdf/1707.05589.pdf
- Climbing towards NLU:On Meaning, Form, and Understanding in the Age of Data https://www.aclweb.org/anthology/2020.acl-main.463.pdf