A curated list of papers that may be of interest to Software Engineering students or professionals.
- Von Neumann's First Computer Program. Knuth (1970).
- The Education of a Computer. Hopper (1952).
- Recursive Programming. Dijkstra (1960).
- Programming Considered as a Human Activity. Dijkstra (1965).
- Goto Statement Considered Harmful. Dijkstra (1968).
- Program development by stepwise refinement. Wirth (1971).
- The paradigms of programming. Floyd (1979).
- Computing Machinery and Intelligence. Turing (1950).
- Some Moral and Technical Consequences of Automation. Wiener (1960).
- Steps towards Artificial Intelligence. Minsky (1960).
- ELIZA—a computer program for the study of natural language communication between man and machine. Weizenbaum (1966).
- A Theory of the Learnable. Valiant (1984).
- Computer Programming as an Art. Knuth (1974).
- The Humble Programmer. Dijkstra (1972).
- The Emperor’s Old Clothes. Hoare (1981).
- Literate Programming. Knuth (1984).
- Programming as Theory Building. Naur (1985).
- A Method for the Construction of Minimum-Redundancy Codes. Huffman (1952).
- A Universal Algorithm for Sequential Data Compression. Ziv, Lempel (1977).
- On the Shortest Spanning Subtree of a Graph and the Traveling Salesman Problem. Kruskal (1956).
- A Note on Two Problems in Connexion with Graphs. Dijkstra (1959).
- Space/Time Trade-offs in Hash Coding with Allowable Errors. Bloom (1970).
- Ordered hash tables. Amble, Knuth (1974).
- Big Omicron and big Omega and big Theta. Knuth (1976).
- The Ubiquitous B-Tree. Comer (1979).
- Making data structures persistent. Driscoll et al (1986).
- Engineering a Sort Function. Bentley, McIlroy (1993).
- Quicksort. Hoare (1962).
- Programming pearls: algorithm design techniques. Bentley (1984).
- A Design Methodology for Reliable Software Systems. Liskov (1972).
- On the Criteria To Be Used in Decomposing Systems into Modules. Parnas (1971).
- Information Distribution Aspects of Design Methodology. Parnas (1972).
- Designing Software for Ease of Extension and Contraction. Parnas (1979).
- The Modular Structure of Complex Systems. Parnas, Clements, Weiss (1984).
- Toward higher-level abstractions for software systems. Shaw (1990).
- Foundations for the Study of Software Architecture. Perry, Wolf (1992).
- Software Aging. Parnas (1994).
- Programming with Abstract Data Types. Liskov, Zilles (1974).
- The Smalltalk-76 Programming System Design and Implementation. Ingalls (1978).
- A Theory of Type Polymorphism in Programming. Milner (1978).
- On understanding types, data abstraction, and polymorphism. Cardelli, Wegner (1985).
- SELF: The Power of Simplicity. Ungar, Smith (1991).
- Why Functional Programming Matters. Hughes (1990).
- Recursive Functions of Symbolic Expressions and Their Computation by Machine. McCarthy (1960).
- Can Programming Be Liberated from the von Neumann Style?. Backus (1978).
- The Semantic Elegance of Applicative Languages. Turner (1981).
- QuickCheck: A Lightweight Tool for Random Testing of Haskell Programs. Claessen, Hughes (2000).
- Church's Thesis and Functional Programming. Turner (2006).
- The Mythical Man Month. Brooks (1975).
- How do committees invent?. Conway (1968).
- Managing the Development of Large Software Systems. Royce (1970).
- Lisp: Good news, bad news, how to win big. Gabriel (1991).
- The Cathedral and the Bazaar. Raymond (1998).
- No Silver Bullet: Essence and Accidents of Software Engineering. Brooks (1987).
- Software Aspects of Strategic Defense Systems. Parnas (1985).
- On Building Systems That Will Fail. CorbatĂł (1991).
- Out of the Tar Pit. Moseley, Marks (2006).
- Communicating sequential processes. Hoare (1976).
- Solution Of a Problem in Concurrent Program Control. Dijkstra (1965).
- Monitors: An operating system structuring concept. Hoare (1974).
- On the Duality of Operating System Structures. Lauer, Needham (1978).
- The Development of Erlang. Joe Armstrong (1997).
- Software Transactional Memory. Shavit, Touitou (1997).
- The UNIX Time- Sharing System. Ritchie, Thompson (1974).
- An Experimental Time-Sharing System. CorbatĂł, Merwin Daggett, Daley (1962).
- The Structure of the "THE"-Multiprogramming System. Dijkstra (1968).
- Reflections on Trusting Trust. Thompson (1984).
- The Design and Implementation of a Log-Structured File System. Rosenblum, Ousterhout (1991).
- Thinking Methodically about Performance. Gregg (2012).
- Performance Anti-Patterns. Smaalders (2006).
- Thinking Clearly about Performance. Millsap (2010).
- A Relational Model of Data for Large Shared Data Banks. Codd (1970).
- Granularity of Locks and Degrees of Consistency in a Shared Data Base. Gray et al (1975).
- System R: Relational Approach to Database Management. Astrahan et al. (1976).
- Access Path Selection in a Relational Database Management System. Selinger et al (1979).
- The Transaction Concept: Virtues and Limitations. Gray (1981).
- The design of POSTGRES. Stonebraker, Rowe (1986).
- Rules of Thumb in Data Engineering. Gray, Shenay (1999).
- The Design Philosophy of the DARPA Internet Protocols. Clark (1988).
- A Protocol for Packet Network Intercommunication. Cerf, Kahn (1974).
- Ethernet: Distributed packet switching for local computer networks. Metcalfe, Boggs (1978).
- End-To-End Arguments in System Design. Saltzer, Reed, Clark (1984).
- An algorithm for distributed computation of a Spanning Tree in an Extended LAN. Perlman (1985).
- TOR: The second generation onion router. Dingledine et al (2004).
- Why the Internet only just works. Handley (2006).
- The Network is Reliable. Bailis, Kingsbury (2014).
- New Directions in Cryptography. Diffie, Hellman (1976).
- A Method for Obtaining Digital Signatures and Public-Key Cryptosystems. Rivest, Shamir, Adleman (1978).
- How To Share A Secret. Shamir (1979).
- A Certified Digital Signature. Merkle (1979).
- Protocols for Public Key Cryptosystems. Merkle (1980).
- K-Anonymity: A Model For Protecting Privacy. Sweeney (2002).
- Time, Clocks, and the Ordering of Events in a Distributed System. Lamport (1978).
- Self-stabilizing systems in spite of distributed control. Dijkstra (1974).
- The Byzantine Generals Problem. Lamport, Shostak, Pease (1982).
- Impossibility of Distributed Consensus With One Faulty Process. Fisher, Lynch, Patterson (1985).
- Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial. Schneider (1990).
- How to Build a Highly Available System Using Consensus. Lampson (1996).
- Paxos made simple. Lamport (2001).
- In Search of an Understandable Consensus Algorithm. Ongaro, Ousterhout (2014).
- CAP Twelve Years Later: How the "Rules" Have Changed. Brewer (2012).
- Epidemic Algorithms for Replicated Database Maintenance. Demers et al (1987).
- The Dangers of Replication. Gray et al (1996).
- Harvest, Yield, and Scalable Tolerant Systems. Fox, Brewer (1999).
- Building on Quicksand. Helland, Campbell (2009).
- Life Beyond Distributed Transactions: An apostate's opinion. Helland (2016).
- The anatomy of a large-scale hypertextual Web search engine. Brin, Page (1998).
- A Statistical Interpretation of Term Specificity in Retrieval. Spärck Jones (1972).
- World-Wide Web: Information Universe. Berners-Lee et al (1992).
- The PageRank Citation Ranking: Bringing Order to the Web. Page, Brin, Motwani (1999).
- Dynamo, Amazon’s Highly Available Key-value store. DeCandia et al (2007).
- The Google File System. Ghemawat, Gobioff, Leung (2003).
- MapReduce: Simplified Data Processing on Large Clusters. Dean, Ghemawat (2004).
- Bigtable: A Distributed Storage System for Structured Data. Chang et al (2006).
- ZooKeeper: wait-free coordination for internet scale systems. Hunt et al (2010).
- Kafka: a Distributed Messaging System for Log Processing. Kreps, Narkhede, Rao (2011).
- Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases. Verbitski et al (2017).
- On Designing and Deploying Internet Scale Services. Hamilton (2007).
- Ironies of automation. Bainbridge (1983).
- How Complex Systems Fail. Cook (2000).
- Recovery Oriented Computing (ROC): Motivation, Definition, Techniques, and Case Studies. Patterson et al (2002).
- Crash-Only Software. Candea, Fox (2003).
- Nines are Not Enough: Meaningful Metrics for Clouds. Mogul, Wilkes (2019).
- Bitcoin, A peer-to-peer electronic cash system. Nakomoto (2008).
- Ethereum: A Next-Generation Smart Contract and Decentralized Application Platform. Buterin (2014).
Top-level papers only
- Von Neumann's First Computer Program. Knuth (1970).
- Computing Machinery and Intelligence. Turing (1950).
- Computer Programming as an Art. Knuth (1974).
- A Method for the Construction of Minimum-Redundancy Codes. Huffman (1952).
- On the Shortest Spanning Subtree of a Graph and the Traveling Salesman Problem. Kruskal (1956).
- Engineering a Sort Function. Bentley, McIlroy (1993).
- A Design Methodology for Reliable Software Systems. Liskov (1972).
- Programming with Abstract Data Types. Liskov, Zilles (1974).
- Why Functional Programming Matters. Hughes (1990).
- The Mythical Man Month. Brooks (1975).
- No Silver Bullet: Essence and Accidents of Software Engineering. Brooks (1987).
- Communicating sequential processes. Hoare (1976).
- The UNIX Time- Sharing System. Ritchie, Thompson (1974).
- Thinking Methodically about Performance. Gregg (2012).
- A Relational Model of Data for Large Shared Data Banks. Codd (1970).
- The Design Philosophy of the DARPA Internet Protocols. Clark (1988).
- New Directions in Cryptography. Diffie, Hellman (1976).
- Time, Clocks, and the Ordering of Events in a Distributed System. Lamport (1978).
- Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial. Schneider (1990).
- CAP Twelve Years Later: How the "Rules" Have Changed. Brewer (2012).
- The anatomy of a large-scale hypertextual Web search engine. Brin, Page (1998).
- Dynamo, Amazon’s Highly Available Key-value store. DeCandia et al (2007).
- On Designing and Deploying Internet Scale Services. Hamilton (2007).
- Bitcoin, A peer-to-peer electronic cash system. Nakomoto (2008).
All papers in chronological order
- Computing Machinery and Intelligence. Turing (1950).
- The Education of a Computer. Hopper (1952).
- A Method for the Construction of Minimum-Redundancy Codes. Huffman (1952).
- On the Shortest Spanning Subtree of a Graph and the Traveling Salesman Problem. Kruskal (1956).
- A Note on Two Problems in Connexion with Graphs. Dijkstra (1959).
- Recursive Programming. Dijkstra (1960).
- Some Moral and Technical Consequences of Automation. Wiener (1960).
- Steps towards Artificial Intelligence. Minsky (1960).
- Recursive Functions of Symbolic Expressions and Their Computation by Machine. McCarthy (1960).
- Quicksort. Hoare (1962).
- An Experimental Time-Sharing System. CorbatĂł, Merwin Daggett, Daley (1962).
- Programming Considered as a Human Activity. Dijkstra (1965).
- Solution Of a Problem in Concurrent Program Control. Dijkstra (1965).
- ELIZA—a computer program for the study of natural language communication between man and machine. Weizenbaum (1966).
- Goto Statement Considered Harmful. Dijkstra (1968).
- How do committees invent?. Conway (1968).
- The Structure of the "THE"-Multiprogramming System. Dijkstra (1968).
- Von Neumann's First Computer Program. Knuth (1970).
- Space/Time Trade-offs in Hash Coding with Allowable Errors. Bloom (1970).
- Managing the Development of Large Software Systems. Royce (1970).
- A Relational Model of Data for Large Shared Data Banks. Codd (1970).
- Program development by stepwise refinement. Wirth (1971).
- On the Criteria To Be Used in Decomposing Systems into Modules. Parnas (1971).
- The Humble Programmer. Dijkstra (1972).
- A Design Methodology for Reliable Software Systems. Liskov (1972).
- Information Distribution Aspects of Design Methodology. Parnas (1972).
- A Statistical Interpretation of Term Specificity in Retrieval. Spärck Jones (1972).
- Computer Programming as an Art. Knuth (1974).
- Ordered hash tables. Amble, Knuth (1974).
- Programming with Abstract Data Types. Liskov, Zilles (1974).
- Monitors: An operating system structuring concept. Hoare (1974).
- The UNIX Time- Sharing System. Ritchie, Thompson (1974).
- A Protocol for Packet Network Intercommunication. Cerf, Kahn (1974).
- Self-stabilizing systems in spite of distributed control. Dijkstra (1974).
- The Mythical Man Month. Brooks (1975).
- Granularity of Locks and Degrees of Consistency in a Shared Data Base. Gray et al (1975).
- Big Omicron and big Omega and big Theta. Knuth (1976).
- Communicating sequential processes. Hoare (1976).
- System R: Relational Approach to Database Management. Astrahan et al. (1976).
- New Directions in Cryptography. Diffie, Hellman (1976).
- A Universal Algorithm for Sequential Data Compression. Ziv, Lempel (1977).
- The Smalltalk-76 Programming System Design and Implementation. Ingalls (1978).
- A Theory of Type Polymorphism in Programming. Milner (1978).
- Can Programming Be Liberated from the von Neumann Style?. Backus (1978).
- On the Duality of Operating System Structures. Lauer, Needham (1978).
- Ethernet: Distributed packet switching for local computer networks. Metcalfe, Boggs (1978).
- A Method for Obtaining Digital Signatures and Public-Key Cryptosystems. Rivest, Shamir, Adleman (1978).
- Time, Clocks, and the Ordering of Events in a Distributed System. Lamport (1978).
- The paradigms of programming. Floyd (1979).
- The Ubiquitous B-Tree. Comer (1979).
- Designing Software for Ease of Extension and Contraction. Parnas (1979).
- Access Path Selection in a Relational Database Management System. Selinger et al (1979).
- How To Share A Secret. Shamir (1979).
- A Certified Digital Signature. Merkle (1979).
- Protocols for Public Key Cryptosystems. Merkle (1980).
- The Emperor’s Old Clothes. Hoare (1981).
- The Semantic Elegance of Applicative Languages. Turner (1981).
- The Transaction Concept: Virtues and Limitations. Gray (1981).
- The Byzantine Generals Problem. Lamport, Shostak, Pease (1982).
- Ironies of automation. Bainbridge (1983).
- A Theory of the Learnable. Valiant (1984).
- Literate Programming. Knuth (1984).
- Programming pearls: algorithm design techniques. Bentley (1984).
- The Modular Structure of Complex Systems. Parnas, Clements, Weiss (1984).
- Reflections on Trusting Trust. Thompson (1984).
- End-To-End Arguments in System Design. Saltzer, Reed, Clark (1984).
- Programming as Theory Building. Naur (1985).
- On understanding types, data abstraction, and polymorphism. Cardelli, Wegner (1985).
- Software Aspects of Strategic Defense Systems. Parnas (1985).
- An algorithm for distributed computation of a Spanning Tree in an Extended LAN. Perlman (1985).
- Impossibility of Distributed Consensus With One Faulty Process. Fisher, Lynch, Patterson (1985).
- Making data structures persistent. Driscoll et al (1986).
- The design of POSTGRES. Stonebraker, Rowe (1986).
- No Silver Bullet: Essence and Accidents of Software Engineering. Brooks (1987).
- Epidemic Algorithms for Replicated Database Maintenance. Demers et al (1987).
- The Design Philosophy of the DARPA Internet Protocols. Clark (1988).
- Toward higher-level abstractions for software systems. Shaw (1990).
- Why Functional Programming Matters. Hughes (1990).
- Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial. Schneider (1990).
- SELF: The Power of Simplicity. Ungar, Smith (1991).
- Lisp: Good news, bad news, how to win big. Gabriel (1991).
- On Building Systems That Will Fail. CorbatĂł (1991).
- The Design and Implementation of a Log-Structured File System. Rosenblum, Ousterhout (1991).
- Foundations for the Study of Software Architecture. Perry, Wolf (1992).
- World-Wide Web: Information Universe. Berners-Lee et al (1992).
- Engineering a Sort Function. Bentley, McIlroy (1993).
- Software Aging. Parnas (1994).
- How to Build a Highly Available System Using Consensus. Lampson (1996).
- The Dangers of Replication. Gray et al (1996).
- The Development of Erlang. Joe Armstrong (1997).
- Software Transactional Memory. Shavit, Touitou (1997).
- The Cathedral and the Bazaar. Raymond (1998).
- The anatomy of a large-scale hypertextual Web search engine. Brin, Page (1998).
- Rules of Thumb in Data Engineering. Gray, Shenay (1999).
- Harvest, Yield, and Scalable Tolerant Systems. Fox, Brewer (1999).
- The PageRank Citation Ranking: Bringing Order to the Web. Page, Brin, Motwani (1999).
- QuickCheck: A Lightweight Tool for Random Testing of Haskell Programs. Claessen, Hughes (2000).
- How Complex Systems Fail. Cook (2000).
- Paxos made simple. Lamport (2001).
- K-Anonymity: A Model For Protecting Privacy. Sweeney (2002).
- Recovery Oriented Computing (ROC): Motivation, Definition, Techniques, and Case Studies. Patterson et al (2002).
- The Google File System. Ghemawat, Gobioff, Leung (2003).
- Crash-Only Software. Candea, Fox (2003).
- TOR: The second generation onion router. Dingledine et al (2004).
- MapReduce: Simplified Data Processing on Large Clusters. Dean, Ghemawat (2004).
- Church's Thesis and Functional Programming. Turner (2006).
- Out of the Tar Pit. Moseley, Marks (2006).
- Performance Anti-Patterns. Smaalders (2006).
- Why the Internet only just works. Handley (2006).
- Bigtable: A Distributed Storage System for Structured Data. Chang et al (2006).
- Dynamo, Amazon’s Highly Available Key-value store. DeCandia et al (2007).
- On Designing and Deploying Internet Scale Services. Hamilton (2007).
- Bitcoin, A peer-to-peer electronic cash system. Nakomoto (2008).
- Building on Quicksand. Helland, Campbell (2009).
- Thinking Clearly about Performance. Millsap (2010).
- ZooKeeper: wait-free coordination for internet scale systems. Hunt et al (2010).
- Kafka: a Distributed Messaging System for Log Processing. Kreps, Narkhede, Rao (2011).
- Thinking Methodically about Performance. Gregg (2012).
- CAP Twelve Years Later: How the "Rules" Have Changed. Brewer (2012).
- The Network is Reliable. Bailis, Kingsbury (2014).
- In Search of an Understandable Consensus Algorithm. Ongaro, Ousterhout (2014).
- Ethereum: A Next-Generation Smart Contract and Decentralized Application Platform. Buterin (2014).
- Life Beyond Distributed Transactions: An apostate's opinion. Helland (2016).
- Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases. Verbitski et al (2017).
- Nines are Not Enough: Meaningful Metrics for Clouds. Mogul, Wilkes (2019).
This list was inspired by (and draws from) several books and paper collections:
- Papers We Love
- Ideas That Created the Future
- The Innovators
- The morning paper
- Distributed systems for fun and profit
- Readings in Database Systems (the Red Book)
- Fermat's Library
A few interesting resources about reading papers from Papers We Love and elsewhere:
- Should I read papers?
- How to Read an Academic Article
- How to Read a Paper. Keshav (2007).
- Efficient Reading of Papers in Science and Technology. Hanson (1999).
- On ICSE’s “Most Influential Papers”. Parnas (1995).
- The list should stay short. Let's say no more than 30 papers.
- The idea is not to include every interesting paper that I come across but rather to keep a representative list that's possible to read from start to finish with a similar level of effort as reading a technical book from cover to cover.
- I tried to include one paper per each major topic and author. Since in the process I found a lot of noteworthy alternatives, related or follow-up papers and I wanted to keep track of those as well, I included them as sublist items (some of these sublists are currently longer than they should).
- The papers shouldn't be too long. For the same reasons as the previous item, I try to avoid papers longer than 20 or 30 pages.
- They should be self-contained and readable enough to be approachable by the casual technical reader.
- They should be freely available online.
- Although historical relevance was taken into account, I omitted seminal papers in the cases where I found them hard to approach, when the main subject of the paper wasn't the thing that made them influential, etc.
- That being said, where possible I preferred the original paper on each subject over modern updates or summary papers.
- I tended to prefer topics that I can relate to my professional practice, typically papers originated in the industry
or about innovations that later saw wide adoption.
- Similarly, I tended to skip more theoretical papers, those focusing on mathematical foundations for Computer Science, electronic aspects of hardware, etc.
- UI/UX and modern Machine Learning are missing because I'm not familiar enough with those areas to find relevant, non overly specific papers. Suggestions are welcome.
Disclaimer: I'm not a frequent paper reader, so I made this list as a sort of roadmap for myself. I haven't read all of the papers in the list yet; as I do, I may find than some don't meet the described criteria after all and remove them, or decide to add new ones.
And, yes, this repository is a way to procrastinate on the actual reading after I finished making the list.