This repository aims to provide a curated list of foundational and transformative papers in the field of Artificial Intelligence (AI). These papers have been instrumental in shaping the AI landscape and continue to hold significant relevance in today's era of large-scale models and the move towards Artificial General Intelligence (AGI). This list can serve as a starting point for those interested in understanding the key milestones in AI, as well as a reference for researchers and professionals.
- 📕 Essential Reading: Papers that anyone entering the field of AI should read.
- 🎯 Deep Dive: For readers looking for a more in-depth understanding.
- 💡 Innovative Ideas: Cutting-edge papers with novel approaches or ideas.
- 🕒 Quick Overview: Brief, yet impactful papers that can be read quickly.
- 📜 Historical Context: Papers that have been influential in the development of AI and continue to be of historical importance.
- 🤔 Philosophical Insights: Papers and books that offer philosophical or deeply conceptual insights into AI and cognition.
This section traces the tumultuous journey of Artificial Intelligence and Machine Learning before the onset of the deep learning era. This period was marked by groundbreaking discoveries, heated debates, and philosophical quandaries. It also witnessed the near demise and ultimate resurgence of neural networks and connectionism, technologies that were initially celebrated, then harshly criticized. The field was under relentless scrutiny but ultimately found vindication, guided by visionaries who defied the odds.
-
🤔 1950: Computing Machinery and Intelligence by Alan Turing, Set the stage for AI, asking the provocative question: "Can machines think?"
-
📜 1958: Perceptron: A probabilistic model for information storage and organization in the brain by Frank Rosenblatt, Ushered in the era of neural networks but later faced scrutiny.
-
📜 1969: Perceptrons: An Introduction to Computational Geometry by Marvin Minsky and Seymour Papert, A critique that nearly froze neural network research, claiming perceptrons were fundamentally limited.
-
🤔 1980: Minds, Brains, and Programs (Chinese Room Argument) by John Searle, Presented the Chinese Room argument, questioning the nature of machine "understanding."
-
🎯 1985: Learning Internal Representations by Error Propagation (Boltzmann Machine) by Geoffrey Hinton et al., A glimmer of hope for connectionism, introducing unsupervised learning techniques.
-
📜 1986: Learning Representations by Back-propagating Errors (Backpropagation) by Geoffrey Hinton, David Rumelhart, Ronald Williams, Vindicated neural networks by introducing the backpropagation algorithm, despite academic skepticism.
-
🤔 1988: Mind Children by Hans Moravec, Introduced Moravec's Paradox, highlighting the counter-intuitive computational complexities of tasks that are simple for humans but difficult for machines.
-
📜 1989: A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition (HMM) by Rabiner, A cornerstone for speech recognition that diverted focus from neural networks.
-
📜 1989: Multilayer Feedforward Networks are Universal Approximators by Hornik et al., A theoretical validation that neural networks could approximate any function.
-
📜 1992: A Training Algorithm for Optimal Margin Classifiers (SVM) by Boser et al., Introduced a robust alternative to neural networks with Support Vector Machines.
-
📜 1998: Gradient-Based Learning Applied to Document Recognition (CNN/GTN) by Yann LeCun et al., A step towards reviving interest in neural networks through Convolutional Neural Networks.
-
💡 2001: A Fast and Elitist Multiobjective Genetic Algorithm (NSGA-II) by Deb et al., Pioneered multi-objective optimization, steering clear of neural networks.
-
💡 2003: Latent Dirichlet Allocation (LDA) by Blei et al., Introduced a generative model, offering yet another alternative to neural networks.
-
📜 2006: Reducing the Dimensionality of Data with Neural Networks (Autoencoder) by Geoffrey Hinton and Ruslan Salakhutdinov, Brought neural networks back into the limelight by effectively reducing data dimensions.
-
🎯 2006: A Fast Learning Algorithm for Deep Belief Nets by Geoffrey Hinton, Simon Osindero, Yee-Whye Teh, Laid the groundwork for the deep learning renaissance, against all odds.
-
🕒 2008: Visualizing Data using t-SNE (t-SNE) by Laurens van der Maaten and Geoffrey Hinton, Introduced an effective technique for visualizing high-dimensional data, renewing interest in machine learning techniques.
In 2012, Geoffrey Hinton and his team's AlexNet won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), thrusting deep learning into the academic and industrial limelight. Soon after, Hinton joined Google Brain, further accelerating the field's advancement. By 2017, AI had outperformed humans in the competition, marking the end of the ImageNet Challenge. This era spawned a range of breakthrough optimization techniques and algorithms. The papers in this list have had a lasting impact, shaping the foundation of today's deep learning research and applications.
-
🕒 2009: ImageNet: A Large-Scale Hierarchical Image Database (ImageNet) by Deng et al., Set the stage for the upcoming deep learning revolution by providing a large-scale image database.
-
📕 2012: ImageNet Classification with Deep Convolutional Neural Networks (AlexNet) by Alex Krizhevsky, Geoffrey Hinton, et al., Set the standard for deep convolutional networks in image classification.
-
📕 2013: Efficient Estimation of Word Representations in Vector Space (Word2vec) by Tomas Mikolov et al., Introduced efficient word embeddings.
-
📕 2013: Playing Atari with Deep Reinforcement Learning by DeepMind, Marked the applicability of deep learning to reinforcement learning tasks.
-
📕 2013: Auto-Encoding Variational Bayes (VAE) by Kingma and Welling, Introduced Variational Autoencoders as a generative model.
-
📕 2013: Maxout Networks by Ian Goodfellow et al., Introduced the Maxout activation function to reduce overfitting.
-
🕒 2014: Adam: A Method for Stochastic Optimization (Adam) by Diederik P. Kingma et al., Introduced an effective stochastic optimization algorithm.
-
🕒 2014: Neural Machine Translation by Jointly Learning to Align and Translate (Seq2Seq) by Bahdanau et al., Pioneered the sequence-to-sequence learning architecture.
-
📕 2014: Dropout: A Simple Way to Prevent Neural Networks from Overfitting by Srivastava et al., Introduced Dropout as a regularization method.
-
📕 2014: Generative Adversarial Networks (GAN) by Ian Goodfellow et al., Introduced Generative Adversarial Networks as a new type of generative model.
-
💡 2015: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift (BatchNorm) by Sergey Ioffe and Christian Szegedy, Introduced normalization layers to accelerate network training.
-
📕 2015: Deep Residual Learning for Image Recognition (ResNet) by Kaiming He et al., Introduced residual connections to train very deep networks.
-
🕒 2016: Mastering the game of Go with deep neural networks and tree search (AlphaGo) by DeepMind, Demonstrated the ability of deep learning to master complex tasks like Go.
-
🎯 2016: You Only Look Once: Unified, Real-Time Object Detection (YOLO) by Joseph Redmon et al., Introduced a real-time object detection system.
-
🕒 2016: WaveNet: A Generative Model for Raw Audio by DeepMind, Introduced a deep generative model for generating raw audio, revolutionizing the field of speech synthesis and audio generation.
In the wake of Google's "Attention Is All You Need" in 2017, the landscape of deep learning shifted towards larger and more complex models. Google took the lead by launching BERT in 2018, a behemoth with 340 million parameters. However, the tides started to turn when OpenAI entered the stage. With the unveiling of GPT-2 in 2019 and later GPT-3 in 2020, OpenAI not only caught up but threatened Google's dominance. These large language models redefined the capabilities of AI in natural language understanding and generation. This list compiles the groundbreaking papers that have set the stage for this era, influencing both academic research and real-world applications.
-
📕 2017: Attention Is All You Need by Vaswani et al., The foundational paper introducing the Transformer architecture.
-
📕 2018: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding by Devlin et al., Introduced BERT and revolutionized text classification tasks.
-
🎯 2019: Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context by Dai et al., Extended Transformer to handle long sequences.
-
🕒 2019: GPT-2: Language Models are Unsupervised Multitask Learners by Radford et al., Introduced GPT-2 and showcased the model's generative capabilities.
-
🕒 2019: RoBERTa: A Robustly Optimized BERT Pretraining Approach by Liu et al., Improved BERT's pretraining for better performance.
-
📜 2019: DistilBERT: a distilled version of BERT by Sanh et al., Introduced the concept of model distillation to the BERT architecture.
-
📜 2019: XLNet: Generalized Autoregressive Pretraining for Language Understanding by Yang et al., Presented an autoregressive pretraining model.
-
📜 2019: ERNIE: Enhanced Language Representation with Informative Entities by Zhang et al., Enriched word embeddings with entity information.
-
📜 2019: T5: Text-To-Text Transfer Transformer by Raffel et al., Unified various NLP tasks into a text-to-text format.
-
📕 2020: ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators by Clark et al., Introduced a more sample-efficient pre-training method.
-
📕 2020: GPT-3: Language Models are Few-Shot Learners by OpenAI, Introduced GPT-3, the largest language model at its time.
-
🎯 2020: Longformer: The Long-Document Transformer by Beltagy et al., Improved Transformer's capacity to handle long documents.
-
🤔 2020: The Next Decade in AI: Four Steps Towards Robust Artificial Intelligence by Gary Marcus, Provided a perspective on the limitations of current AI.
-
🎯 2020: Reformer: The Efficient Transformer by Kitaev et al., Made Transformer architectures more memory-efficient.
-
🕒 2020: Scaling Laws for Neural Language Models
-
🕒 2021: Github Copilot & Codex: Evaluating Large Language Models Trained on Code by OpenAI, Introduced a large-scale language model trained on code.
-
🕒 2021: Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity by Google, Introduced techniques to scale up Transformers to trillion-parameter sizes.
-
🕒 2021: TurboTransformers: An Efficient GPU Serving for Transformer Models by Tencent, Optimized Transformer model serving to enhance efficiency on GPUs.
-
🕒 2022: Illustrating Reinforcement Learning from Human Feedback (RLHF)
-
🕒 2022: ChatGPT: Optimizing Language Models for Dialogue by OpenAI.
- 📕 2021: CLIP: Connecting Texts and Images using Contrastive Learning by OpenAI, Bridged the gap between natural language and visual understanding.
- 🕒 2022: High-Resolution Image Synthesis with Latent Diffusion Models(Stable Diffusion) by DeepMind, Introduced a novel approach for generating high-resolution images, enhancing the field of generative language models.
As the field of artificial intelligence continues to evolve, this README will strive to keep up-to-date with the latest groundbreaking research papers and methodologies. Contributions are welcome.