/Do-LSTMs-learn-Syntax

Evaluate language models on syntactic tasks.

Primary LanguagePythonMIT LicenseMIT

Do LSTMs learn Syntax?

Time underlies many aspects of human behaviour.

In 1990, Elman proposed representing time implicitly, by the effect it has on processing. In this framework, hidden units are fed back to themselves determining systems to gain dynamic properties and become responsive to temporal sequences. Therefore, Recurrent Neural Networks (RNNs) arose from the necessity to represent time.

RNNs are both general and effective at capturing long-term temporal dependencies. Their gated variants, Long Short-term Memories (LSTMs), have proven even better at modelling long-distance regularities and have become the de facto building block in many neural processing tasks, such as machine translation and language modelling.

RNNs do not explicitly encode hierarchical structures encountered in many natural settings, amongst which language. However, due to their memory and processing capacity, they are able to develop powerful internal representations that reflect task demands in the context of prior internal states. The question arising is, then: are RNNs able to implicitly discover syntactic features?

This is a fascinating topic that requires further investigations as there is no definite answer yet.

In this repository, I give away:

  • a selection of papers that I found relevant/interesting on the topic with links to the code, if publicly available
  • some experimental code to build your own templates and run your own evaluation experiments on your favourite pre-trained language model (recurrent or non-recurrent)
  • some paper summaries along with my thoughts and findings related to the topic

Finding Structure in Time, Elman (1990)

Distributed Representations, Simple Recurrent Networks, and Grammatical Structure, Elman (1991)

Learning and development in neural networks: The importance of starting small, Elman (1993)

A Recurrent Neural Network that Learns to Count, Rodriguez et al. (1999)

Toward a connectionist model of recursion in human linguistic performance, Christiansen and Chater (1999)

Recurrent Nets That Time and Count, Gers and Schmidhuber (2000)

Context-free and context-sensitive dynamics in recurrent neural networks, Boden and Wiles

LSTM recurrent networks learn simple context-free and context-sensitive languages, Gers and Schmidhuber (2001)

Incremental training of first order recurrent neural networks to predict a context-sensitive language, Chalup and Blair (2003)

Statistical Representation of Grammaticality Judgements: the Limits of N-Gram Models, Clark et al. (2013)

LSTM: A Search Space Odyssey, Greff et al. (2015)

Unsupervised Prediction of Acceptability Judgements, Lau et al. (2015)

Structures, Not Strings: Linguistics as Part of the Cognitive Sciences, Everaert et al. (2015)

An Empirical Exploration of Recurrent Network Architectures, Jozefowicz et al. (2015)

The Now-or-Never bottleneck: A fundamental constraint on language, Christiansen and Chater (2016)

Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies, Linzen et al. (2016) [code]

Recurrent Neural Network Grammars, Dyer et al. (2016)

Sequence Memory Constraints Give Rise to Language-Like Structure through Iterated Learning, Cornish et al. (2017)

Exploring the Syntactic Abilities of RNNs with Multi-task Learning, Enguehard et al. (2017) [code]

What Do Recurrent Neural Network Grammars Learn About Syntax?, Kuncoro et al. (2017)

On the State of the Art of Evaluation in Neural Language Models, Melis et al. (2017)

How Grammatical is Character-level Neural Machine Translation? Assessing MT Quality with Contrastive Translation Pairs, Rico Sennrich (2017)

Using Deep Neural Networks to Learn Syntactic Agreement, Bernardy and Lappin (2017)

Colorless green recurrent networks dream hierarchically, Gulordava et al. (2018) [code]

Targeted Syntactic Evaluation of Language Models, Marvin and Linzen (2018) [code]

Deep RNNs Encode Soft Hierarchical Syntax, Blevins et al. (2018)

Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures, Tang et al. (2018)

The Importance of Being Recurrent for Modeling Hierarchical Structure, Tran et al. (2018) [code]

What can linguistics and deep learning contribute to each other?, Linzen (2018)

Do RNNs learn human-like abstract word order preferences?, Futrell and Levy (2018) [code]

What do RNN Language Models Learn about Filler–Gap Dependencies?, Wilcox et al. (2018)

LSTMs Can Learn Syntax-Sensitive Dependencies Well, But Modeling Structure Makes Them Better, Kuncoro et al. (2018)

On Evaluating the Generalization of LSTM Models in Formal Languages, Suzgun et al. [code]

Evaluating the Ability of LSTMs to Learn Context-Free Grammars, Sennhauser and Berwick (2018)

Finding Syntax in Human Encephalography with Beam Search, Hale et al. (2018)

Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks, Shen et al. (2018) [code]

Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context, Khandelwal et al. (2018)

On the State of the Art of Evaluation in Neural Language Models, Melis et al. (2018)

Neural Network Acceptability Judgments, Warstadt et al. (2018) [code]

Assessing BERT’s Syntactic Abilities, Goldberg (2019) [code]

Human few-shot learning of compositional instructions, Lake, Linzen, and Baroni (2019)

Neural Language Models as Psycholinguistic Subjects: Representations of Syntactic State, Futrell et al. (2019) [code]

Studying the Inductive Biases of RNNs with Synthetic Variations of Natural Languages, Ravfogel, Goldberg, and Linzen (2019)