This repository contains a list of the books, blogs, research papers and white papers that I have read and found interesting.
Table of contents
- AI, DL, NLP and RL
- Calculus
- Computer Architecture
- Computer Graphics
- Data Structures and Algorithms
- Digital Electronics
- Graph Theory
- Information Theory
- Linear Algebra
- Measure Theory
- Optimization Theory
- Probability and Stochastic Processes
- Quantum Computing
- Signal Processing
AI, DL, NLP and RL
- 1-bit Adam: communication efficient large-scale training with Adam’s convergence speed
Hanlin Tang, Shaoduo Gan, Ammar Ahmad Awan, Samyam Rajbhandari, Conglong Li, Xiangru Lian, Ji Liu, Ce Zhang, Yuxiong He
- 5 best practices for efficient model training
Matthew Leavitt, Abhinav Venigalla
- 8-bit approximations for parallelism in deep learning
Tim Dettmers
- 8-bit optimizers via block-wise quantization
Tim Dettmers, Mike Lewis, Sam Shleifer, Luke Zettlemoyer
- A 'neural' network that learns to play Backgammon
Gerald Tesauro, Terrence J. Sejnowski
- A BetterTransformer for fast transformer inference
Michael Gschwind, Eric Han, Scott Wolchok, Rui Zhu, Christian Puhrsch
- A deep reinforced model for abstractive summarization
Romain Paulus, Caiming Xiong, Richard Socher
- A dynamical approach to temporal pattern processing
W. Scott Stornetta, Tad Hogg, Bernardo A. Huberman
- A few more examples may be worth billions of parameters
Yuval Kirstain, Patrick Lewis, Sebastian Riedel, Omer Levy
- A general and adaptive robust loss function
Jonathan T. Barron
- A gentle introduction to 8-bit matrix multiplication for transformers at scale using Hugging Face transformers, accelerate and bitsandbytes
Younes Belkada, Tim Dettmers
- A note on the evaluation of generative models
Lucas Theis, Aäron van den Oord, Matthias Bethge
- A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings
Mikel Artetxe, Gorka Labaka, Eneko Agirre
- A simple but tough-to-beat baseline for sentence embeddings
Sanjeev Arora, Yingyu Liang, Tengyu Ma
- A simple language model for task-oriented dialogue
Ehsan Hosseini-Asl, Bryan McCann, Chien-Sheng Wu, Semih Yavuz, Richard Socher
- A simple neural attentive meta-learner
Nikhil Mishra, Mostafa Rohaninejad, Xi Chen, Pieter Abbeel
- A simple neural network module for relational reasoning
Adam Santoro, David Raposo, David G.T. Barrett, Mateusz Malinowski, Razvan Pascanu, Peter Battaglia, Timothy Lillicrap
- A study of BFLOAT16 for deep learning training
Dhiraj Kalamkar, Dheevatsa Mudigere, Naveen Mellempudi, Dipankar Das, Kunal Banerjee, Sasikanth Avancha, Dharma Teja Vooturi, Nataraj Jammalamadaka, Jianyu Huang, Hector Yuen, Jiyan Yang, Jongsoo Park, Alexander Heinecke, Evangelos Georganas, Sudarshan Srinivasan, Abhisek Kundu, Misha Smelyanskiy, Bharat Kaul, Pradeep Dubey
- A style-based generator architecture for generative adversarial networks
Tero Karras, Samuli Laine, Timo Aila
- A stylometric inquiry into hyperpartisan and fake news
Martin Potthast, Johannes Kiesel, Kevin Reinartz, Janek Bevendorff, Benno Stein
- A3T: adversarially augmented adversarial training
Akram Erraqabi, Aristide Baratin, Yoshua Bengio, Simon Lacoste-Julien
- Accelerated PyTorch 2 transformers
Michael Gschwind, Driss Guessous, Christian Puhrsch
- Accelerating large language model training with variable sparse pre-training and dense fine-tuning
Abhay Gupta, Mahmoud Salem, Vithursan Thangarasa, Kevin Leong, Sean Lie, Shreyas Saxena
- Accelerating PyTorch with CUDA graphs
Vinh Nguyen, Michael Carilli, Sukru Burc Eryilmaz, Vartika Singh, Michelle Lin, Natalia Gimelshein, Alban Desmaison, Edward Yang
- AdapterHub: a framework for adapting transformers
Jonas Pfeiffer, Andreas Rücklé, Clifton Poth, Aishwarya Kamath, Ivan Vulić, Sebastian Ruder, Kyunghyun Cho, Iryna Gurevych
- Adversarial approximate inference for speech to electroglottograph conversion
Prathosh A. P., Varun Srivastava, Mayank Mishra
- Adversarial autoencoders
Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, Ian Goodfellow, Brendan Frey
- Adversarial examples that fool both computer vision and time-limited humans
Gamaleldin F. Elsayed, Shreya Shankar, Brian Cheung, Nicolas Papernot, Alex Kurakin, Ian Goodfellow, Jascha Sohl-Dickstein
- Adversarial feature learning
Jeff Donahue, Philipp Krähenbühl, Trevor Darrell
- Adversarial generation of natural language
Sai Rajeswar, Sandeep Subramanian, Francis Dutil, Christopher Pal, Aaron Courville
- Adversarial information factorization
Antonia Creswell, Yumnah Mohamied, Biswa Sengupta, Anil A Bharath
- Adversarially learned inference
Vincent Dumoulin, Ishmael Belghazi, Ben Poole, Olivier Mastropietro, Alex Lamb, Martin Arjovsky, Aaron Courville
- AlexaTM 20B: few-shot learning using a large-scale multilingual seq2seq model
Saleh Soltan, Shankar Ananthakrishnan, Jack FitzGerald, Rahul Gupta, Wael Hamza, Haidar Khan, Charith Peris, Stephen Rawls, Andy Rosenbaum, Anna Rumshisky, Chandana Satya Prakash, Mukund Sridhar, Fabian Triefenbach, Apurv Verma, Gokhan Tur, Prem Natarajan
- Amazon SageMaker model parallelism: a general and flexible framework for large model training
Can Karakus, Rahul Huilgol, Fei Wu, Anirudh Subramanian, Cade Daniel, Derya Cavdar, Teng Xu, Haohan Chen, Arash Rahnama, Luis Quintela
- An image is worth 16x16 words: transformers for image recognition at scale
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby
- An overview of gradient descent optimization algorithms
Sebastian Ruder
- Analysing mathematical reasoning abilities of neural models
David Saxton, Edward Grefenstette, Felix Hill, Pushmeet Kohli
- Approximation by superpositions of sigmoidal function
George Cybenko
- Artificial Intelligence: a modern approach
Stuart Russell, Peter Norvig
- Aspect based sentiment analysis with gated convolutional networks
Wei Xue, Tao Li
- Attention is all you need
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin
- Attention is off by one
Evan Miller
- Auto-encoding variational Bayes
Diederik P. Kingma, Max Welling
- Backpropagation through the void: optimizing control variates for black-box gradient estimation
Will Grathwohl, Dami Choi, Yuhuai Wu, Geoffrey Roeder, David Duvenaud
- BART: denoising sequence-to-sequence pre-training for natural language generation, translation and comprehension
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, Luke Zettlemoyer
- Batch normalization: accelerating deep network training by reducing internal covariate shift
Sergey Ioffe, Christian Szegedy
- Behavioral cloning from observation
Faraz Torabi, Garrett Warnell, Peter Stone
- BERT: pre-training of deep bidirectional transformers for language understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova
- Beyond domain APIs: Task-oriented conversational modeling with unstructured knowledge access
Seokhwan Kim, Mihail Eric, Karthik Gopalakrishnan, Behnam Hedayatnia, Yang Liu, Dilek Hakkani-Tur
- Blockwise parallel transformer for large context models
Hao Liu, Pieter Abbeel
- BLOOM: A 176B-parameter open-access multilingual language model
Aaron Gokaslan, Abheesht Sharma, Abhinav Ramesh Kashyap, Adam Roberts, Adi Simhi, Ahmed Baruwa, Aitor Soroa, Albert Villanova del Moral, Albert Webson, Alexander M. Rush, Alexandra Sasha Luccioni, Alfredo Palasciano, Alham Fikri Aji, Alice Rueda, Alison Callahan, Amanda Pestana, Amanpreet Singh, Amir Feizpour, Amit Alfassy, Ammar Khan, Amy Faranak, Ana Santos, Anastasia Cheveleva, Andrea Santilli, Angela Fan, Angelina McMillan-Major, Anima Shukla, Anna Rogers, Anne-Laure Ligozat, Anthony Hevia, Antigona Unldreaj, Antoine Chaffin, Antonio Miranda-Escalada, Arash Aghagol, Arezoo Abdollahi, Ariel Kreisberg Nitzav, Arjun Subramonian, Arnaud Stiegler, Arun Raja, Aurélie Névéol, Aycha Tammour, Ayush Singh, Azadeh HajiHosseini, Bahareh Behroozi, Benjamin Ajibade, Benjamin Beilharz, Benjamin Heinzerling, Benoît Sagot, Bharat Saxena, Bo Wang, Caio Brito, Canwen Xu, Carlos Muñoz Ferrandis, Charles Lovering, Chenghao Mou, Chenglei Si, Chenxi Zhou, Chirag Jain, Chris Emezue, Christopher Akiki, Christopher Klamm, Chuxin Xu, Clémentine Fourrier, Colin Leong, Colin Raffel, Conglong Li, Dan Garrette, Daniel Hesslow, Daniel León Periñán, Daniel Molano, Daniel van Strien, Danish Contractor, David Ifeoluwa Adelani, David Lansky, Davis David, Davut Emre Taşar, Debajyoti Datta, Deepak Narayanan, Deepak Tunuguntla, Dian Yu, Douwe Kiela, Dragomir Radev, Duong A. Nguyen, Eduardo González Ponferrada, Edward Tan, Efrat Levkovizh, Ehud Reiter, Ekaterina Taktasheva, Ekaterina Voloshina, Eli Bogdanov, Eliza Szczechla, Elizabeth Salesky, Ellie Pavlick, Emi Baylor, Enrique Manjavacas, Ethan Kim, Eyal Bar Natan, Ezinwanne Ozoani, Fabio Barth, Fatima Mirza, Florian Fuhrimann, Francesco De Toni, Frankline Ononiwu, François Yvon, Gabriel Altay, Genta Indra Winata, Germán Kruszewski, Giada Pistilli, Giyaseddin Bayrak, Gully Burns, Gunjan Chhablani, Gérard Dupont, Habib Rezanejad, Hadar Tojarieh, Hady Elsahar, Hailey Schoelkopf, Hamza Benyamina, Han Wang, Harshit Pandey, Hatim Bourfoune, Helena U. Vrabec, Hendrik Strobelt, Hessie Jones, Hieu Tran, Hugo Laurençon, Huu Nguyen, Hyung Won Chung, Ian Yu, Idris Abdulmumin, Imane Bello, Indrani Bhattacharya, Irene Solaiman, Irina Sedenko, Isaac Johnson, Isar Nejadgholi, Ishani Dash, Itziar Gonzalez-Dios, Iz Beltagy, Jaesung Tae, Jan-Christoph Kalo, Jared Casper, Jason Alan Fries, Jason Phang, Javier de la Rosa, Jeff Rasley, Jekaterina Novikova, Jenny Chim, Jesse Dodge, Jesse Passmore, Jessica Zosa Forde, Jian Zhu, Jihyun Kang, John Giorgi, Jonas Golde, Jonathan Chang, Jonathan Tow, Jordan Clive, Jos Rozen, Jose David Posada, Joseph Tobing, Josh Seltzer, Joydeep Bhattacharjee, Julien Launay, Julio Bonis Sanz, Jungo Kasai, Jörg Frohberg, Karthik Rangasai Sivaraman, Ken Kawamura, Khalid Almubarak, Kimbo Chen, Kyle Lo, Leandro Von Werra, Leo Gao, Leon Weber, Liam Hazan, Lintang Sutawika, Livia Dutra, Lokesh Bulchandani, Long Phan, Loubna Ben allal, Lu Liu, Lucile Saulnier, Ludovic Tanguy, Luisa Shinzato, M Saiful Bari, Madeleine Hahn de Bykhovetz, Maged S. Al-shaibani, Maiko Takeuchi, Mairon Samagaio, Manan Dey, Manuel Romero Muñoz, Maraim Elbadri, Maraim Masoud, Marc Pàmies, Margaret Mitchell, Margot Mieskes, Maria A Castillo, Marianna Nezhurina, Marine Carpuat, Mario Sänger, Mario Šaško, Marissa Gerchick, Martha Akinlolu, María Grandury, Mathilde Bras, Matteo Manica, Matthias Gallé, Matthias Samwald, Max Huang, Max Ryabinin, Maximin Coavoux, Mayank Mishra, Mayank Singh, Michael Cullan, Michael McKenna, Michael Weinberg, Michiel De Wolf, Mike Qiu, Mike Tian-Jian Jiang, Mina Mihaljcic, Minh Chien Vu, Minjia Zhang, Minna Liu, Miruna Clinciu, Mohammad A. Jauhar, Mohammad Shoeybi, Moritz Freidank, Muhammed Ghauri, Mustafa Ghaleb, Mykola Burynok, Myriam Peyrounette, Myungsun Kang, Nafis Abrar, Najoung Kim, Natasha Seelam, Nathan Dahlberg, Nazneen Rajani, Newton Cheng, Nicholas Michio Broad, Nicolas Patry, Nihal Nayak, Niklas Muennighoff, Nikolaus Muellner, Nishant Subramani, Nora Kassner, Nouamane Tazi, Nour Elkott, Nour Fahmy, Nurulaqilla Khamis, Ofir Press, Olanrewaju Samuel, Olatunji Ruwase, Oleg Serikov, Olivier Nguyen, Omar Espejel, Omar Sanseviero, Omer Antverg, Ona de Gibert, Oskar van der Wal, Pascale Fung, Patrick Haller, Patrick von Platen, Paulo Villegas, Pawan Sasanka Ammanamanchi, Pedro Ortiz Suarez, Peter Henderson, Pierre Colombo, Pierre Cornette, Pierre François Lavallée, Priscilla Amuok, Quentin Lhoest, Rachel Bawden, Ramya Chandrasekhar, Ran An, Rasmus Kromann, Renata Eisenberg, Rheza Harliman, Rishi Bommasani, Robert Martin, Roberto Luis López, Rodrigo Canalli, Roman Castagné, Rosaline Su, Rui Ribeiro, Rui Zhang, Ruisi Su, Ruochen Zhang, Ryan Hao, Ryan Teehan, Rémi Lacroix, Sabrina J. Mielke, Salomey Osei, Samira Alizadeh, Sampo Pyysalo, Samson Tan, Samuel Albanie, Samuel Cahyawijaya, Samuele Garda, Samyam Rajbhandari, Sanchit Gandhi, Sarmad Shubber, Sebastian Gehrmann, Sebastian Nagel, Shachar Mirkin, Shaden Smith, Shaked Brody, Shamik Bose, Shamsuddeen Hassan Muhammad, Shani Pais, Shanya Sharma, Shayne Longpre, Sheng Shen, Shlok S Deshmukh, Shubhanshu Mishra, Sid Kiblawi, Silas Wang, Simon Ott, Sinee Sang-aroonsiri, Somaieh Nikpoor, Sourav Roy, Srishti Kumar, Srulik Ben-David, Stanislav Silberberg, Stas Bekman, Stefan Schweter, Stella Biderman, Stephen H. Bach, Stéphane Requena, Suhas Pai, Suraj Patil, Sushil Bharati, Suzana Ilić, Sydney Zink, Sylvain Viguier, Taewoon Kim, Tali Bers, Tanmay Laud, Tatiana Shavrina, Teven Le Scao, Thanh Le, Thibault Fevry, Thomas Scialom, Thomas Wang, Thomas Wolf, Théo Gigant, Tiago Timponi Torrent, Tian Yun, Tim Dettmers, Timo Schick, Tobi Oyebade, Tomasz Limisiewicz, Tomoya Kainuma, Trieu Le, Trishala Neeraj, Tristan Thrush, Urmish Thakker, Valentin Danchev, Vassilina Nikoulina, Verena Rieser, Veronika Laippala, Victor Sanh, Vikas Raunak, Violette Lepercq, Vitaly Protasov, Vladislav Mikhailov, Vrinda Prabhu, Wilson Y. Lee, Wojciech Kusa, Xiangru Tang, Yacine Jernite, Yada Pruksachatkun, Yallow Uri, Yanis Labrak, Yash Shailesh Bajaj, Yash Venkatraman, Yifan Xu, Yingxin Xu, Yonatan Belinkov, Younes Belkada, Yoyo Yang, Yu Xu, Zach Nguyen, Zachary Bamberger, Zaid Alyafeai, Zdeněk Kasner, Zeerak Talat, Zhe Tan, Zheng-Xin Yong, Zhiqing Sun, Zhongli Xie, Zifan Ye
- Bootstrapping entity alignment with knowledge graph embedding
Zequn Sun, Wei Hu, Qingheng Zhang, Yuzhong Qu
- Bridging the gap between prior and posterior knowledge selection for knowledge-grounded dialogue generation
Xiuyi Chen, Fandong Meng, Peng Li, Feilong Chen, Shuang Xu, Bo Xu, Jie Zhou
- Bringing open large language models to consumer devices
MLC Community
- BTLM-3B-8K: 7B performance in a 3 billion parameter model
Nolan Dey, Daria Soboleva, Faisal Al-Khateeb, Ribhu Pathria, Hemant Khachane, Shaheer Muhammad, Zhiming (Charles) Chen, Bowen Yang, Siyun Li, Abhay Gupta, Shreyas Saxena, Robert Myers, Jacob Robert Steeves, Marvin Tom, Joel Hestness
- Building blocks for a complex-valued transformer architecture
Florian Eilers, Xiaoyi Jiang
- ChatGPT: optimizing language models for dialogue
- ColBERT: efficient and effective passage search via contextualized late interaction over BERT
Omar Khattab, Matei Zaharia
- Colossal-AI: a unified deep learning system for large-scale parallel training
Zhengda Bian, Hongxin Liu, Boxiang Wang, Haichen Huang, Yongbin Li, Chuanrui Wang, Fan Cui, Yang You
- Compiling machine learning programs via high-level tracing
Roy Frostig, Matthew Johnson, Chris Leary
- Complex transformer: a framework for modeling complex-valued sequence
Muqiao Yang, Martin Q. Ma, Dongyu Li, Yao-Hung Hubert Tsai, Ruslan Salakhutdinov
- Conceptual captions: a cleaned, hypernymed, image alt-text dataset for automatic image captioning
Piyush Sharma, Nan Ding, Sebastian Goodman, Radu Soricut
- Conditional image synthesis with auxilliary classifier GANs
Augustus Odena, Christopher Olah, Jonathon Shlens
- Conformal nucleus sampling
Shauli Ravfogel, Yoav Goldberg, Jacob Goldberger
- Connecting large language models with evolutionary algorithms yields powerful prompt optimizers
Qingyan Guo, Rui Wang, Junliang Guo, Bei Li, Kaitao Song, Xu Tan, Guoqing Liu, Jiang Bian, Yujiu Yang
- Connectivity versus entropy
Yaser S. Abu-Mostafa
- Constituency parsing with a self-attentive encoder
Nikita Kitaev, Dan Klein
- Constraint based knowledge base distillation in end-to-end task oriented dialogs
Dinesh Raghu, Atishya Jain, Mausam, Sachindra Joshi
- Context generation improves open domain question answering
Dan Su, Mostofa Patwary, Shrimai Prabhumoye, Peng Xu, Ryan Prenger, Mohammad Shoeybi, Pascale Fung, Anima Anandkumar, Bryan Catanzaro
- Convert transformers to ONNX with hugging face optimum
Philipp Schmid
- Convolutional networks for graphs for learning molecular fingerprints
David K. Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Alan Aspuru-Guzik, Ryan P. Adams
- Convolutional neural network language models
Ngoc-Quan Pham, Germán Kruszewski, Gemma Boleda
- Countering adversarial images using input transformations
Chuan Guo, Mayank Rana, Moustapha Cisse, Laurens van der Maaten
- Cramming: training a language model on a single GPU in one day
Jonas Geiping, Tom Goldstein
- Crosslingual generalization through multitask finetuning
Niklas Muennighoff, Thomas Wang, Lintang Sutawika, Adam Roberts, Stella Biderman, Teven Le Scao, M Saiful Bari, Sheng Shen, Zheng-Xin Yong, Hailey Schoelkopf, Xiangru Tang, Dragomir Radev, Alham Fikri Aji, Khalid Almubarak, Samuel Albanie, Zaid Alyafeai, Albert Webson, Edward Raff, Colin Raffel
- Curriculum learning
Yoshua Bengio, Jérôme Louradour, Ronan Collobert, Jason Weston
- Cutting down on prompts and parameters: simple few-shot learning with language models
Robert L. Logan IV, Ivana Balažević, Eric Wallace, Fabio Petroni, Sameer Singh, Sebastian Riedel
- Deep Boltzmann machines
Ruslan Salakhutdinov, Geoffrey Hinton
- Deep complex networks
Chiheb Trabelsi, Olexa Bilaniuk, Ying Zhang, Dmitriy Serdyuk, Sandeep Subramanian, João Felipe Santos, Soroush Mehri, Negar Rostamzadeh, Yoshua Bengio, Christopher J Pal
- Deep learning
Ian Goodfellow, Yoshua Bengio, Aaron Courville
- Deep learning and the information bottleneck principle
Naftali Tishby, Noga Zaslavsky
- Deep learning techniques for super-resolution in video games
Alexander Watson
- Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
- Deep text classification can be fooled
Bin Liang, Hongcheng Li, Miaoqiang Su, Pan Bian, Xirong Li, Wenchang Shi
- DeepSpeed compression: a composable library for extreme compression and zero-cost quantization
DeepSpeed Team, Andrey Proskurin
- DeepSpeed Inference: enabling efficient inference of transformer models at unprecedented scale
Reza Yazdani Aminabadi, Samyam Rajbhandari, Minjia Zhang, Ammar Ahmad Awan, Cheng Li, Du Li, Elton Zheng, Jeff Rasley, Shaden Smith, Olatunji Ruwase, Yuxiong He
- DeepSpeed powers 8x larger MoE model training with high performance
DeepSpeed Team, Z-code Team
- DeepSpeed: accelerating large-scale model inference and training via system optimizations and compression
DeepSpeed Team, Rangan Majumder, Andrey Proskurin
- DeepSpeed: advancing MoE inference and training to power next-generation AI scale
DeepSpeed Team, Andrey Proskurin
- Denoising distantly supervised open-domain question answering
Yankai Lin, Haozhe Ji, Zhiyuan Liu, Maosong Sun
- Diffusion convolutional recurrent neural network: data-driven traffic forecasting
Yaguang Li, Rose Yu, Cyrus Shahabi, Yan Liu
- Discrete variational autoencoders
Jason Tyler Rolfe
- Disentangling by factorising
Hyunjik Kim, Andriy Mnih
- Disentangling language and knowledge in task-oriented dialogs
Dinesh Raghu, Nikhil Gupta, Mausam
- Distributionally robust language modeling
Yonatan Oren, Shiori Sagawa, Tatsunori B. Hashimoto, Percy Liang
- Editing models with task arithmetic
Gabriel Ilharco, Marco Tulio Ribeiro, Mitchell Wortsman, Suchin Gururangan, Ludwig Schmidt, Hannaneh Hajishirzi, Ali Farhadi
- Efficient estimation of word representations in vector space
Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean
- Efficient large scale language modeling with mixtures of experts
Mikel Artetxe, Shruti Bhosale, Naman Goyal, Todor Mihaylov, Myle Ott, Sam Shleifer, Xi Victoria Lin, Jingfei Du, Srinivasan Iyer, Ramakanth Pasunuru, Giri Anantharaman, Xian Li, Shuohui Chen, Halil Akin, Mandeep Baines, Louis Martin, Xing Zhou, Punit Singh Koura, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Mona Diab, Zornitsa Kozareva, Ves Stoyanov
- Efficient large-scale language model training on GPU clusters using Megatron-LM
Deepak Narayanan, Mohammad Shoeybi, Jared Casper, Patrick LeGresley, Mostofa Patwary, Vijay Anand Korthikanti, Dmitri Vainbrand, Prethvi Kashinkunti, Julie Bernauer, Bryan Catanzaro, Amar Phanishayee, Matei Zaharia
- Enchancing the reliability of out-of-distribution image detection in neural networks
Shiyu Liang, Yixuan Li, R. Srikant
- End-to-end task-oriented dialog modeling with semi-structured knowledge management
Silin Gao, Ryuichi Takanobu, Antoine Bosselut, Minlie Huang
- Ensemble adversarial training: attacks and defenses
Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, Patrick McDaniel
- Equilibrium propagation: bridging the gap between energy-based models and backpropagation
Benjamin Scellier, Yoshua Bengio
- Estimating or propagating gradients through stochastic neurons for conditional computation
Yoshua Bengio, Nicholas Léonard, Aaron Courville
- Exemplar encoder-decoder for neural conversation generation
Gaurav Pandey, Danish Contractor, Vineet Kumar, Sachindra Joshi
- Expert human-level driving in gran turismo sport using deep reinforcement learning with image-based representation
Ryuji Imamura, Takuma Seno, Kenta Kawamoto, Michael Spranger
- Exploring deep recurrent models with reinforcement learning for molecule design
Daniel Neil, Marwin Segler, Laura Guasch, Mohamed Ahmed, Dean Plumbley, Matthew Sellwood, Nathan Brown
- Exploring the limits of transfer learning with a unified text-to-text transformer
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu
- Extreme compression for pre-trained transformers made simple and efficient
Xiaoxia Wu, Zhewei Yao, Minjia Zhang, Conglong Li, Yuxiong He
- Fast abstractive summarization with reinforce-selected sentence rewriting
Yen-Chun Chen, Mohit Bansal
- Fast transformer decoding: one write-head is all you need
Noam Shazeer
- Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning
Haokun Liu, Derek Tam, Mohammed Muqeeth, Jay Mohta, Tenghao Huang, Mohit Bansal, Colin Raffel
- FFJORD: Free-form continuous dynamics for scalable reversible generative models
Will Grathwohl, Ricky T. Q. Chen, Jesse Bettencourt, Ilya Sutskever, David Duvenaud
- Finetuned language models are zero-shot learners
Jason Wei, Maarten Bosma, Vincent Y. Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, Quoc V. Le
- Flash-decoding for long-context inference
Tri Dao, Daniel Haziza, Francisco Massa, Grigory Sizov
- FlashAttention: fast and memory-efficient exact attention with IO-awareness
Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, Christopher Ré
- FlashAttention: fast transformer training with long sequences
Tri Dao
- Foundations of NLP explained visually: beam search, how it works
Ketan Doshi
- Generating adversarial examples with adversarial networks
Chaowei Xiao, Bo Li, Jun-yan Zhu, Warren He, Mingyan Liu, Dawn Song
- Generating sentences from a continuous space
Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz, Samy Bengio
- Generation-augmented retrieval for open-domain question answering
Yuning Mao, Pengcheng He, Xiaodong Liu, Yelong Shen, Jianfeng Gao, Jiawei Han, Weizhu Chen
- Generative adversarial nets
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio
- Genetic algorithms in search, optimization and machine learning
David E. Goldberg
- GeoMAN: multi-level attention networks for geo-sensory time series prediction
Yuxuan Liang, Songyu Ke, Junbo Zhang, Xiuwen Yi, Yu Zheng
- Getting the most out of the NVIDIA A100 GPU with Multi-Instance GPU
Maggie Zhang, James Sohn, Chetan Tekur
- GLaM: efficient scaling of language models with mixture-of-experts
Nan Du, Yanping Huang, Andrew M. Dai, Simon Tong, Dmitry Lepikhin, Yuanzhong Xu, Maxim Krikun, Yanqi Zhou, Adams Wei Yu, Orhan Firat, Barret Zoph, Liam Fedus, Maarten Bosma, Zongwei Zhou, Tao Wang, Yu Emma Wang, Kellie Webster, Marie Pellat, Kevin Robinson, Kathleen Meier-Hellstern, Toju Duke, Lucas Dixon, Kun Zhang, Quoc V Le, Yonghui Wu, Zhifeng Chen, Claire Cui
- GLM-130B: an open bilingual pre-trained model
- GLU variants improve transformer
Noam Shazeer
- Going deeper with convolutions
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich
- GPT-4 architecture, infrastructure, training dataset, costs, vision, MoE
Dylan Patel, Gerald Wong
- GPT-NeoX-20B: an open-source autoregressive language model
Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach
- GQA: training generalized multi-query transformer models from multi-head checkpoints
Joshua Ainslie, James Lee-Thorp, Michiel de Jong, Yury Zemlyanskiy, Federico Lebrón, Sumit Sanghai
- Gradient-based hyperparameter optimization through reversible learning
Dougal Maclaurin, David Duvenaud, Ryan P. Adams
- Graph attention networks
Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, Yoshua Bengio
- Hierarchical neural story generation
Angela Fan, Mike Lewis, Yann Dauphin
- Hindsight: posterior-guided training of retrievers for improved open-ended generation
Ashwin Paranjape, Omar Khattab, Christopher Potts, Matei Zaharia, Christopher D. Manning
- HotFlip: white-box adversarial examples for text classification
Javid Ebrahimi, Anyi Rao, Daniel Lowd, Dejing Dou
- How big should my language model be?
Teven Le Scao
- How should AI systems behave, and who should decide?
- How we sped up transformer inference 100x for 🤗 API customers
- How 🤗 Accelerate runs very large models thanks to PyTorch
Sylvain Gugger
- HyKnow: end-to-end task-oriented dialog modeling with hybrid knowledge management
Silin Gao, Ryuichi Takanobu, Wei Peng, Qun Liu, Minlie Huang
- Hyperparameter search with Transformers and Ray Tune
- Image-to-image translation with conditional generative adversarial networks
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros
- ImageNet classification using deep convolutional neural networks
Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton
- Improving entity linking by modeling latent relations between mentions
Phong Le, Ivan Titov
- Improving language models by retrieving from trillions of tokens
Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George van den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, Diego de Las Casas, Aurelia Guy, Jacob Menick, Roman Ring, Tom Hennigan, Saffron Huang, Loren Maggiore, Chris Jones, Albin Cassirer, Andy Brock, Michela Paganini, Geoffrey Irving, Oriol Vinyals, Simon Osindero, Karen Simonyan, Jack W. Rae, Erich Elsen, Laurent Sifre
- Improving language understanding by generative pre-training
Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever
- Improving reinforcement learning from human feedback with efficient reward model ensemble
Shun Zhang, Zhenfang Chen, Sunli Chen, Yikang Shen, Zhiqing Sun, Chuang Gan
- Incredibly fast BLOOM inference with DeepSpeed and Accelerate
Stas Bekman, Sylvain Gugger
- Inference suboptimality in variational autoencoders
Chris Cremer, Xuechen Li, David Duvenaud
- InfoGAN: interpretable representation learning by information maximizing generative adversarial nets
Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, Pieter Abbeel
- Interpretable convolutional neural networks via feedforward design
C.-C. Jay Kuo, Min Zhang, Siyang Li, Jiali Duan, Yueru Chen
- Introducing MPT-7B: a new standard for open-source, commercially usable LLMs
The MosaicML NLP Team
- Introducing nvFuser, a deep learning compiler for PyTorch
Christian Sarofeen, Piotr Bialecki, Jie Jiang, Kevin Stephano, Masaki Kozuki, Neal Vaidya, Stas Bekman
- Introducing Turing image super resolution: AI powered image enhancements for Microsoft Edge and Bing maps
- Introducing 🤗 accelerate
Sylvain Gugger
- Is ChatGPT 175 billion parameters? Technical analysis
Oren Leung
- Is the future of neural networks Sparse? An introduction (1/N)
François Lagunas
- Joint reasoning on hybrid-knowledge sources for task-oriented dialog
Mayank Mishra, Danish Contractor, Dinesh Raghu
- Know what you don't know: unanswerable questions for SQuAD
Pranav Rajpurkar, Robin Jia, Percy Liang
- Knowledge-grounded dialogue generation with pre-trained language models
Xueliang Zhao, Wei Wu, Can Xu, Chongyang Tao, Dongyan Zhao, Rui Yan
- Language is not all you need: aligning perception with language models
Shaohan Huang, Li Dong, Wenhui Wang, Yaru Hao, Saksham Singhal, Shuming Ma, Tengchao Lv, Lei Cui, Owais Khan Mohammed, Barun Patra, Qiang Liu, Kriti Aggarwal, Zewen Chi, Johan Bjorck, Vishrav Chaudhary, Subhojit Som, Xia Song, Furu Wei
- Language modeling with gated convolutional networks
Yann N. Dauphin, Angela Fan, Michael Auli, David Grangier
- Language modelling with pixels
Phillip Rust, Jonas F. Lotz, Emanuele Bugliarello, Elizabeth Salesky, Miryam de Lhoneux, Desmond Elliott
- Language models (mostly) know what they know
Saurav Kadavath, Tom Conerly, Amanda Askell, Tom Henighan, Dawn Drain, Ethan Perez, Nicholas Schiefer, Zac Hatfield-Dodds, Nova DasSarma, Eli Tran-Johnson, Scott Johnston, Sheer El-Showk, Andy Jones, Nelson Elhage, Tristan Hume, Anna Chen, Yuntao Bai, Sam Bowman, Stanislav Fort, Deep Ganguli, Danny Hernandez, Josh Jacobson, Jackson Kernion, Shauna Kravec, Liane Lovitt, Kamal Ndousse, Catherine Olsson, Sam Ringer, Dario Amodei, Tom Brown, Jack Clark, Nicholas Joseph, Ben Mann, Sam McCandlish, Chris Olah, Jared Kaplan
- Language models are unsupervised multitask learners
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever
- Large language models are not fair evaluators
Peiyi Wang, Lei Li, Liang Chen, Zefan Cai, Dawei Zhu, Binghuai Lin, Yunbo Cao, Qi Liu, Tianyu Liu, Zhifang Sui
- Layer normalization
Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton
- Learning activation functions to improve deep neural networks
Forest Agostinelli, Matthew Hoffman, Peter Sadowski, Pierre Baldi
- Learning discourse-level diversity for neural dialog models using conditional variational autoencoders
Tiancheng Zhao, Ran Zhao, Maxine Eskenazi
- Learning on a general network
Amir F. Atiya
- Learning representations by back-propagating errors
David E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams
- Learning transferable visual models from natural language supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever
- Learning word embeddings efficiently with noise-contrastive estimation
Andriy Mnih, Koray Kavukcuoglu
- Lessons learned on language model safety and misuse
- Lifelong language pretraining with distribution-specialized experts
Wuyang Chen, Yanqi Zhou, Nan Du, Yanping Huang, James Laudon, Zhifeng Chen, Claire Cu
- Linear scaling made possible with weight streaming
Andrew Feldman
- Linformer: self-attention with linear complexity
Sinong Wang, Belinda Z. Li, Madian Khabsa, Han Fang, Hao Ma
- LLM in a flash: efficient large language model inference with limited memory
Keivan Alizadeh, Iman Mirzadeh, Dmitry Belenko, Karen Khatamifard, Minsik Cho, Carlo C Del Mundo, Mohammad Rastegari, Mehrdad Farajtabar
- LLM.int8(): 8-bit matrix multiplication for transformers at scale
Tim Dettmers, Mike Lewis, Younes Belkada, Luke Zettlemoyer
- Long sequence modeling with XGen: a 7B LLM trained on 8K input sequence length
Erik Nijkamp, Hiroaki Hayashi, Tian Xie, Congying Xia, Bo Pang, Rui Meng, Wojciech Kryscinski, Lifu Tu, Meghana Bhat, Semih Yavuz, Chen Xing, Jesse Vig, Lidiya Murakhovs'ka, Chien-Sheng Wu, Yingbo Zhou, Shafiq Rayhan Joty, Caiming Xiong, Silvio Savarese
- LoRA: Low-Rank Adaptation of large language models
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen
- Lost in the middle: how language models use long contexts
Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, Percy Liang
- M6-10T: a sharing-delinking paradigm for efficient multi-trillion parameter pretraining
Junyang Lin, An Yang, Jinze Bai, Chang Zhou, Le Jiang, Xianyan Jia, Ang Wang, Jie Zhang, Yong Li, Wei Lin, Jingren Zhou, Hongxia Yang
- Machine learning
Tom M. Mitchell
- Machine learning: a probabilistic perspective
Kevin P. Murphy
- Making deep learning go brrrr from first principles
Horace He
- Making DeepSpeed ZeRO run efficiently on more-affordable hardware
Justin Chiu, Shuai Zheng
- Mask & focus: conversation modelling by learning concepts
Gaurav Pandey, Dinesh Raghu, Sachindra Joshi
- Maximizing communication efficiency for large-scale training via 0/1 Adam
Yucheng Lu, Conglong Li, Minjia Zhang, Christopher De Sa, Yuxiong He
- MCR-DL: mix-and-match communication runtime for deep learning
Quentin Anthony, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar Panda
- MegaBlocks: efficient sparse training with mixture-of-experts
Trevor Gale, Deepak Narayanan, Cliff Young, Matei Zaharia
- Megatron-LM: training multi-billion parameter language models using model parallelism
Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, Bryan Catanzaro
- Memory-efficient pipeline-parallel DNN training
Deepak Narayanan, Amar Phanishayee, Kaiyu Shi, Xie Chen, Matei Zaharia
- MinTL: minimalist transfer learning for task-oriented dialogue systems
Zhaojiang Lin, Andrea Madotto, Genta Indra Winata, Pascale Fung
- Mix and match: learning-free controllable text generation using energy language models
Fatemehsadat Mireshghallah, Kartik Goyal, Taylor Berg-Kirkpatrick
- Mixed precision training
Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, Hao Wu
- Mixture of attention heads: selecting attention heads per token
Xiaofeng Zhang, Yikang Shen, Zeyu Huang, Jie Zhou, Wenge Rong, Zhang Xiong
- Mixture-of-Experts meets instruction tuning: a winning combination for large language models
Sheng Shen, Le Hou, Yanqi Zhou, Nan Du, Shayne Longpre, Jason Wei, Hyung Won Chung, Barret Zoph, William Fedus, Xinyun Chen, Tu Vu, Yuexin Wu, Wuyang Chen, Albert Webson, Yunxuan Li, Vincent Zhao, Hongkun Yu, Kurt Keutzer, Trevor Darrell, Denny Zhou
- mixup: beyond empirical risk minimization
Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, David Lopez-Paz
- MMCoQA: conversational question answering over text, tables and images
Yongqi Li, Wenjie Li, Liqiang Nie
- Mode matching in GANs through latent space learning and inversion
Deepak Mishra, Prathosh A. P., Aravind Jayendran, Varun Srivastava, Santanu Chaudhury
- Multi-level memory for task oriented dialogs
Revanth Reddy, Danish Contractor, Dinesh Raghu, Sachindra Joshi
- Multitask prompt tuning enables parameter-efficient transfer learning
Zhen Wang, Rameswar Panda, Leonid Karlinsky, Rogerio Feris, Huan Sun, Yoon Kim
- MultiWOZ - A large-scale multi-domain Wizard-of-Oz dataset for task-oriented dialogue modelling
Paweł Budzianowski, Tsung-Hsien Wen, Bo-Hsiang Tseng, Iñigo Casanueva, Stefan Ultes, Osman Ramadan, Milica Gašić
- Mutual information neural estimation
Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeswar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, R Devon Hjelm
- NeMo: a toolkit for building AI applications using neural modules
Oleksii Kuchaiev, Jason Li, Huyen Nguyen, Oleksii Hrinchuk, Ryan Leary, Boris Ginsburg, Samuel Kriman, Stanislav Beliaev, Vitaly Lavrukhin, Jack Cook, Patrice Castonguay, Mariya Popova, Jocelyn Huang, Jonathan M. Cohen
- Neural GPUs learn algorithms
Łukasz Kaiser, Ilya Sutskever
- Neural network methods for natural language processing
Yaov Goldberg
- Neural networks and physical systems with emergent collective computational abilities
J. J. Hopfield
- Neural networks for pattern recognition
Christopher M. Bishop
- Neural ordinary differential equations
Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, David Duvenaud
- No train no gain: revisiting efficient training algorithms for transformer-based language models
Jean Kaddour, Oscar Key, Piotr Nawrot, Pasquale Minervini, Matt J. Kusner
- Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples
Anish Athalye, Nicholas Carlini, David Wagner
- OctoPack: instruction tuning code large language models
Niklas Muennighoff, Qian Liu, Armel Zebaze, Qinkai Zheng, Binyuan Hui, Terry Yue Zhuo, Swayam Singh, Xiangru Tang, Leandro von Werra, Shayne Longpre
- On the convergence of Adam and beyond
Sashank J. Reddi, Satyen Kale, Sanjiv Kumar
- On the power of neural networks for solving hard problems
Jehoshua Bruck, Joseph W. Goodman
- One model to learn them all
Lukasz Kaiser, Aidan N. Gomez, Noam Shazeer, Ashish Vaswani, Niki Parmar, Llion Jones, Jakob Uszkoreit
- Open domain question answering over tables via dense retrieval
Jonathan Herzig, Thomas Müller, Syrine Krichene, Julian Eisenschlos
- Open question answering over tables and text
Wenhu Chen, Ming-Wei Chang, Eva Schlinger, William Yang Wang, William W. Cohen
- OPT: open pre-trained transformer language models
Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, Luke Zettlemoyer
- Optimal brain compression: a framework for accurate post-training quantization and pruning
Elias Frantar, Sidak Pal Singh, Dan Alistarh
- Optimal perceptual inference
Geoffrey E. Hinton, Terrence J. Sejnowski
- Optimization story: Bloom inference
Nicolas Patry
- Orca 2: teaching small language models how to reason
Arindam Mitra, Luciano Del Corro, Shweti Mahajan, Andres Codas, Clarisse Simoes, Sahaj Agarwal, Xuxi Chen, Anastasia Razdaibiedina, Erik Jones, Kriti Aggarwal, Hamid Palangi, Guoqing Zheng, Corby Rosset, Hamed Khanpour, Ahmed Awadallah
- Orca: progressive learning from complex explanation traces of GPT-4
Subhabrata Mukherjee, Arindam Mitra, Ganesh Jawahar, Sahaj Agarwal, Hamid Palangi, Ahmed Awadallah
- Outer product-based neural collaborative filtering
Xiangnan He, Xiaoyu Du, Xiang Wang, Feng Tian, Jinhui Tang, Tat-Seng Chua
- Outrageously large neural networks: the sparsely-gated mixture-of-experts layer
Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, Jeff Dean
- Overcoming oscillations in quantization-aware training
Markus Nagel, Marios Fournarakis, Yelysei Bondarenko, Tijmen Blankevoort
- PAL: Program-aided language models
Luyu Gao, Aman Madaan, Shuyan Zhou, Uri Alon, Pengfei Liu, Yiming Yang, Jamie Callan, Graham Neubig
- PaLM: scaling language modeling with pathways
Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Ben Hutchinson, Reiner Pope, James Bradbury, Jacob Austin, Michael Isard, Guy Gur-Ari, Pengcheng Yin, Toju Duke, Anselm Levskaya, Sanjay Ghemawat, Sunipa Dev, Henryk Michalewski, Xavier Garcia, Vedant Misra, Kevin Robinson, Liam Fedus, Denny Zhou, Daphne Ippolito, David Luan, Hyeontaek Lim, Barret Zoph, Alexander Spiridonov, Ryan Sepassi, David Dohan, Shivani Agrawal, Mark Omernick, Andrew M. Dai, Thanumalayan Sankaranarayana Pillai, Marie Pellat, Aitor Lewkowycz, Erica Moreira, Rewon Child, Oleksandr Polozov, Katherine Lee, Zongwei Zhou, Xuezhi Wang, Brennan Saeta, Mark Diaz, Orhan Firat, Michele Catasta, Jason Wei, Kathy Meier-Hellstern, Douglas Eck, Jeff Dean, Slav Petrov, Noah Fiedel
- Parallel context windows improve in-context learning of large language models
Nir Ratner, Yoav Levine, Yonatan Belinkov, Ori Ram, Inbal Magar, Omri Abend, Ehud Karpas, Amnon Shashua, Kevin Leyton-Brown, Yoav Shoham
- Pattern classification
Richard O. Duda, Peter E. Hart, David G. Stork
- Pattern recognition and machine learning
Christopher M. Bishop
- Perceptual losses for real-time style transfer and super-resolution
Justin Johnson, Alexandre Alahi, Li Fei-Fei
- Personalizing dialogue agents: I have a dog, do you have pets too?
Saizheng Zhang, Emily Dinan, Jack Urbanek, Arthur Szlam, Douwe Kiela, Jason Weston
- Phase-functioned neural networks for character control
Daniel Holden, Taku Komura, Jun Saito
- Playing Atari with deep reinforcement learning
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller
- Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing
Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, Graham Neubig
- Prefix-tuning: optimizing continuous prompts for generation
Xiang Lisa Li, Percy Liang
- Probabilistic latent semantic analysis
Thomas Hofmann
- Progressive growing of GANs from improved quality, stability and variation
Tero Karras, Timo Aila, Samuli Laine, Jaakko Lehtinen
- Prompting with pseudo-code instructions
Mayank Mishra, Prince Kumar, Riyaz Bhat, Rudra Murthy V, Danish Contractor, Srikanth Tamilselvam
- PullNet: open domain question answering with iterative retrieval on knowledge bases and text
Haitian Sun, Tania Bedrax-Weiss, William Cohen
- PyTorch trace analysis for the masses
Anupam Bhatnagar, Xizhou Feng, Brian Coutinho, Yifan Liu, Sung-Han Lin, Louis Feng, and Yuzhen Huang
- Q-BERT: Hessian based ultra low precision quantization of BERT
Sheng Shen, Zhen Dong, Jiayu Ye, Linjian Ma, Zhewei Yao, Amir Gholami, Michael W. Mahoney, Kurt Keutzer
- R3Net: recurrent residual refinement network for saliency detection
Zijun Deng, Xiaowei Hu, Lei Zhu, Xuemiao Xu, Jing Qin, Guoqiang Han, Pheng-Ann Heng
- Reading Wikipedia to answer open-domain questions
Danqi Chen, Adam Fisch, Jason Weston, Antoine Bordes
- REALM: Retrieval-augmented language model pretraining
Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, Ming-Wei Chang
- Recurrent models of visual attention
Volodymyr Mnih, Nicolas Heess, Alex Graves, Koray Kavukcuoglu
- Reducing activation recomputation in large transformer models
Vijay Korthikanti, Jared Casper, Sangkug Lym, Lawrence McAfee, Michael Andersch, Mohammad Shoeybi, Bryan Catanzaro
- Regularizing and optimizing LSTM language models
Stephen Merity, Nitish Shirish Keskar, Richard Socher
- Reinforcement Learning: An Introduction
Richard S. Sutton, Andrew G. Barto
- ReLoRA: high-rank training through low-rank updates
Vladislav Lialin, Namrata Shivagunde, Sherin Muckatira, Anna Rumshisky
- Restricted Boltzmann machines for collaborative filtering
Ruslan Salakhutdinov, Andriy Mnih, Geoffrey Hinton
- Retrieval augmentation reduces hallucination in conversation
Kurt Shuster, Spencer Poff, Moya Chen, Douwe Kiela, Jason Weston
- Retrieval-augmented generation for knowledge-intensive NLP tasks
Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela
- Revisiting classifier two-sample tests
David Lopez-Paz, Maxime Oquab
- RoBERTa: a robustly optimized BERT pretraining approach
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov
- RoFormer: enhanced transformer with rotary position embedding
Jianlin Su, Yu Lu, Shengfeng Pan, Ahmed Murtadha, Bo Wen, Yunfeng Liu
- SantaCoder: don't reach for the stars!
Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni, Bernardo García del Río, Qian Liu, Shamik Bose, Urvashi Bhattacharyya, Terry Yue Zhuo, Ian Yu, Paulo Villegas, Marco Zocca, Sourab Mangrulkar, David Lansky, Huu Nguyen, Danish Contractor, Luis Villa, Jia Li, Dzmitry Bahdanau, Yacine Jernite, Sean Hughes, Daniel Fried, Arjun Guha, Harm de Vries, Leandro von Werra
- Scaling instruction-finetuned language models
Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Alex Castro-Ros, Marie Pellat, Kevin Robinson, Dasha Valter, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, Jason Wei
- Scaling PyTorch FSDP for training foundation Models on IBM cloud
Linsong Chu, Less Wright, Hamid Shojanazeri, Sophia Wen, Raghu Ganti, Geeta Chauhan
- Scaling transformer to 1M tokens and beyond with RMT
Aydar Bulatov, Yuri Kuratov, Mikhail S. Burtsev
- Self-instruct: aligning language model with self generated instructions
Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A. Smith, Daniel Khashabi, Hannaneh Hajishirzi
- Self-normalizing neural networks
Günter Klambauer, Thomas Unterthiner, Andreas Mayr, Sepp Hochreiter
- Semantically equivalent adversarial rules for debugging NLP models
Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin
- Seq2seq model and the exposure bias problem
Aditya Mohanty
- Sequence parallelism: long sequence training from system perspective
Shenggui Li, Fuzhao Xue, Chaitanya Baranwal, Yongbin Li, Yang You
- Sequential latent knowledge selection for knowledge-grounded dialogue
Byeongchang Kim, Jaewoo Ahn, Gunhee Kim
- Simple and effective multi-paragraph reading comprehension
Christopher Clark, Matt Gardner
- Simplifying transformer blocks
Bobby He, Thomas Hofmann
- SmoothQuant: accurate and efficient post-training quantization for large language models
Guangxuan Xiao, Ji Lin, Mickael Seznec, Julien Demouth, Song Han
- Soft filter pruning for accelerating deep convolutional neural networks
Yang He, Guoliang Kang, Xuanyi Dong, Yanwei Fu, Yi Yang
- SOLAR 10.7B: scaling large language models with simple yet effective depth up-scaling
Dahyun Kim, Chanjun Park, Sanghoon Kim, Wonsung Lee, Wonho Song, Yunsu Kim, Hyeonwoo Kim, Yungi Kim, Hyeonju Lee, Jihoo Kim, Changbae Ahn, Seonghoon Yang, Sukyung Lee, Hyunbyung Park, Gyoungjin Gim, Mikyoung Cha, Hwalsuk Lee, Sunghun Kim
- SOLOIST: building task bots at scale with transfer learning and machine teaching
Baolin Peng, Chunyuan Li, Jinchao Li, Shahin Shayandeh, Lars Liden, Jianfeng Gao
- Solving quantitative reasoning problems with language models
Aitor Lewkowycz, Anders Andreassen, David Dohan, Ethan Dyer, Henryk Michalewski, Vinay Ramasesh, Ambrose Slone, Cem Anil, Imanol Schlag, Theo Gutman-Solo, Yuhuai Wu, Behnam Neyshabur, Guy Gur-Ari, Vedant Misra
- Spatial temporal graph convolutional networks for skeleton-based action recognition
Sijie Yan, Yuanjun Xiong, Dahua Lin
- Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting
Bing Yu, Haoteng Yin, Zhanxing Zhu
- Spectral normalization for generative adversarial networks
Takeru Miyato, Toshiki Kataoka, Masanori Koyama, Yuichi Yoshida
- Speech and language processing
Daniel Jurafsky, James H. Martin
- StarCoder: may the source be with you!
Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, Qian Liu, Evgenii Zheltonozhskii, Terry Yue Zhuo, Thomas Wang, Olivier Dehaene, Mishig Davaadorj, Joel Lamy-Poirier, João Monteiro, Oleh Shliazhko, Nicolas Gontier, Nicholas Meade, Armel Zebaze, Ming-Ho Yee, Logesh Kumar Umapathi, Jian Zhu, Benjamin Lipkin, Muhtasham Oblokulov, Zhiruo Wang, Rudra Murthy, Jason Stillerman, Siva Sankalp Patel, Dmitry Abulkhanov, Marco Zocca, Manan Dey, Zhihan Zhang, Nour Fahmy, Urvashi Bhattacharyya, Wenhao Yu, Swayam Singh, Sasha Luccioni, Paulo Villegas, Maxim Kunakov, Fedor Zhdanov, Manuel Romero, Tony Lee, Nadav Timor, Jennifer Ding, Claire Schlesinger, Hailey Schoelkopf, Jan Ebert, Tri Dao, Mayank Mishra, Alex Gu, Jennifer Robinson, Carolyn Jane Anderson, Brendan Dolan-Gavitt, Danish Contractor, Siva Reddy, Daniel Fried, Dzmitry Bahdanau, Yacine Jernite, Carlos Muñoz Ferrandis, Sean Hughes, Thomas Wolf, Arjun Guha, Leandro von Werra, Harm de Vries
- Sticking the landing: simple, lower-variance gradient estimators for variational inference
Geoffrey Roeder, Yuhuai Wu, David K. Duvenaud
- StitchNet: composing neural networks from pre-trained fragments
Surat Teerapittayanon, Marcus Comiter, Brad McDanel, H.T. Kung
- Stochastic hyperparameter optimization through hypernetworks
Jonathan Lorraine, David Duvenaud
- Strategies for teaching layered networks classification tasks
Ben S. Wittner, John S. Denker
- Structured prompting: scaling in-context learning to 1,000 examples
Yaru Hao, Yutao Sun, Li Dong, Zhixiong Han, Yuxian Gu, Furu Wei
- Style transfer from non-parallel text by cross-alignment
Tianxiao Shen, Tao Lei, Regina Barzilay, Tommi Jaakkola
- Subword regularization: improving neural network translation models with multiple subword candidates
Taku Kudo
- Supervised learning of probability distributions by neural networks
Eric B. Baum, Frank Wilczek
- Supporting efficient large model training on AMD InstinctTM GPUs with DeepSpeed
Olatunji Ruwase, Jeff Rasley
- Switch transformers: scaling to trillion parameter models with simple and efficient sparsity
William Fedus, Barret Zoph, Noam Shazeer
- Synchronization in neural nets
Jacques J. Vidal, John Haggerty
- Tackling the poor assumptions of Naive Bayes text classifiers
Jason D. M. Rennie, Lawrence Shih, Jaime Teevan, David R. Karger
- The best of both worlds: combining recent advances in neural machine translation
Mia Xu Chen, Orhan Firat, Ankur Bapna, Melvin Johnson, Wolfgang Macherey, George Foster, Llion Jones, Mike Schuster, Noam Shazeer, Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Lukasz Kaiser, Zhifeng Chen, Yonghui Wu, Macduff Hughes
- The elements of statistical learning: data mining, inference and prediction
Trevor Hastie, Robert Tibshirani, Jerome Friedman
- The Flan collection: designing data and methods for effective instruction tuning
Shayne Longpre, Le Hou, Tu Vu, Albert Webson, Hyung Won Chung, Yi Tay, Denny Zhou, Quoc V. Le, Barret Zoph, Jason Wei, Adam Roberts
- The information bottleneck method
Naftali Tishby, Fernando C. Pereira, William Bialek
- The Pile: an 800GB dataset of diverse text for language modeling
Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, Horace He, Anish Thite, Noa Nabeshima, Shawn Presser, Connor Leahy
- The power of scale for parameter-efficient prompt tuning
Brian Lester, Rami Al-Rfou, Noah Constant
- The wisdom of hindsight makes language models better instruction followers
Tianjun Zhang, Fangchen Liu, Justin Wong, Pieter Abbeel, Joseph E. Gonzalez
- Thermometer encoding: one hot way to resist adversarial examples
Jacob Buckman, Aurko Roy, Colin Raffel, Ian Goodfellow
- To regularize or not to regularize? The bias variance trade-off in regularized AEs
Arnab Kumar Mondal, Himanshu Asnani, Parag Singla, Prathosh AP
- Towards crowdsourced training of large neural networks using decentralized mixture-of-experts
Max Ryabinin, Anton Gusev
- Towards deep learning models resilient to adversarial attacks
Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, Adrian Vladu
- Towards evaluating the robustness of neural networks
Nicholas Carlini, David Wagner
- Train short, test long: Attention with linear biases enables input length extrapolation
Ofir Press, Noah Smith, Mike Lewis
- Training compute-optimal large language models
Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack W. Rae, Oriol Vinyals, Laurent Sifre
- Training language models to follow instructions with human feedback
Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe
- Transformer memory as a differentiable search index
Yi Tay, Vinh Q. Tran, Mostafa Dehghani, Jianmo Ni, Dara Bahri, Harsh Mehta, Zhen Qin, Kai Hui, Zhe Zhao, Jai Gupta, Tal Schuster, William W. Cohen, Donald Metzler
- Transformer quality in linear time
Weizhe Hua, Zihang Dai, Hanxiao Liu, Quoc Le
- Transformers explained visually (part 1): overview of functionality
Ketan Doshi
- Transformers explained visually (part 2): how it works, step-by-step
Ketan Doshi
- Transformers explained visually (part 3): multi-head attention, deep dive
Ketan Doshi
- Turing-NLG: a 17-billion-parameter language model by Microsoft
Corby Rosset
- UL2: unifying language learning paradigms
Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Jason Wei, Xuezhi Wang, Hyung Won Chung, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Denny Zhou, Neil Houlsby, Donald Metzler
- Understanding convolutional neural networks with a mathematical model
C.-C. Jay Kuo
- Understanding disentangling in β-VAE
Christopher P. Burgess, Irina Higgins, Arka Pal, Loic Matthey, Nick Watters, Guillaume Desjardins, Alexander Lerchner
- Understanding the Open Pre-Trained Transformers (OPT) library
Cameron Wolfe
- Unit tests for stochastic optimization
Tom Schaul, Ioannis Antonoglou, David Silver
- Universal language model fine-tuning for text classification
Jeremy Howard, Sebastian Ruder
- Unlimiformer: long-range transformers with unlimited length input
Amanda Bertsch, Uri Alon, Graham Neubig, Matthew R. Gormley
- Unpaired image-to-image translation using cycle-consistent adversarial networks
Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros
- Unsupervised machine translation using monolingual corpora only
Guillaume Lample, Alexis Conneau, Ludovic Denoyer, Marc'Aurelio Ranzato
- Unsupervised representation learning by predicting image rotations
Spyros Gidaris, Praveer Singh, Nikos Komodakis
- Using DeepSpeed and Megatron to train Megatron-Turing NLG 530B, the world’s largest and most powerful generative language model
Ali Alvi, Paresh Kharya
- Variational inference using implicit distributions
Ferenc Huszár
- Variational inference with latent space quantization for adversarial resilience
Vinay Kyatham, Mayank Mishra, Tarun Kumar Yadav, Deepak Mishra, Prathosh AP
- Variational learning for unsupervised knowledge grounded dialogs
Mayank Mishra, Dhiraj Madan, Gaurav Pandey, Danish Contractor
- Variational lossy autoencoder
Xi Chen, Diederik P. Kingma, Tim Salimans, Yan Duan, Prafulla Dhariwal, John Schulman, Ilya Sutskever, Pieter Abbeel
- Vector-quantized input-contextualized soft prompts for natural language understanding
Rishabh Bhardwaj, Amrita Saha, Steven C.H. Hoi, Soujanya Poria
- VEEGAN: reducing mode collapse in GANs using implicit variational learning
Akash Srivastava, Lazar Valkov, Chris Russell, Michael U. Gutmann, Charles Sutton
- Very deep convolutional networks for large-scale image recognition
Karen Simonyan, Andrew Zisserman
- Visualizing data using t-SNE
Laurens van der Maaten, Geoffrey Hinton
- Wasserstein GAN
Martin Arjovsky, Soumith Chintala, Léon Bottou
- wav2vec 2.0: a framework for self-supervised learning of speech representations
Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli
- Wavenet: a generative model for raw audio
Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, Koray Kavukcuoglu
- WebGPT: browser-assisted question-answering with human feedback
Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, John Schulman
- What language model to train if you have one million GPU hours?
Teven Le Scao, Thomas Wang, Daniel Hesslow, Lucile Saulnier, Stas Bekman, M Saiful Bari, Stella Biderman, Hady Elsahar, Jason Phang, Ofir Press, Colin Raffel, Victor Sanh, Sheng Shen, Lintang Sutawika, Jaesung Tae, Zheng Xin Yong, Julien Launay, Iz Beltagy
- Word translation without parallel data
Guillaume Lample, Alexis Conneau, Marc'Aurelio Ranzato, Ludovic Denoyer, Hervé Jégou
- Yandex publishes YaLM 100B. It’s the largest GPT-like neural network in open source
Mikhail Khrushchev
- You only look once: unified, real-time object detection
Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi
- ZeRO & DeepSpeed: new system optimizations enable training models with over 100 billion parameters
DeepSpeed Team, Rangan Majumder, Junhua Wang
- ZeRO++: Extremely efficient collective communication for giant model training
Guanhua Wang, Heyang Qin, Sam Ade Jacobs, Connor Holmes, Samyam Rajbhandari, Olatunji Ruwase, Feng Yan, Lei Yang, Yuxiong He
- ZeRO-2 & DeepSpeed: shattering barriers of deep learning speed & scale
DeepSpeed Team, Rangan Majumder, Junhua Wang
- ZeRO-Infinity: breaking the GPU memory wall for extreme scale deep learning
Samyam Rajbhandari, Olatunji Ruwase, Jeff Rasley, Shaden Smith, Yuxiong He
- Zero-shot text-to-image generation
Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, Ilya Sutskever
- ZeRO: memory optimizations toward training trillion parameter models
Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, Yuxiong He
- ZeroQuant: efficient and affordable post-training quantization for large-scale transformers
Zhewei Yao, Reza Yazdani Aminabadi, Minjia Zhang, Xiaoxia Wu, Conglong Li, Yuxiong He
- β-VAE: learning basic visual concepts with a constrained variational framework
Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, Alexander Lerchner
Calculus
- Calculus of variations
I. M. Gelfand, S. V. Fomin
- Thomas' calculus
George B. Thomas Jr., Maurice D. Weir
Computer Architecture
- Accelerated computing with a reconfigurable dataflow architecture
- Computer architecture: a quantitative approach
John L. Hennessy, David A. Patterson
- Computer organization and design ARM edition: the hardware software interface
David A. Patterson, John L. Hennessy
- Flipping bits in memory without accessing them: an experimental study of DRAM disturbance errors
Yoongu Kim, Ross Daly, Jeremie Kim, Chris Fallin, Ji Hye Lee, Donghyuk Lee, Chris Wilkerson, Konrad Lai, Onur Mutlu
- Improving DRAM performance by parallelizing refreshes with accesses
Kevin Kai-Wei Chang, Donghyuk Lee, Zeshan Chishti, Alaa R. Alameldeen, Chris Wilkerson, Yoongu Kim, Onur Mutlu
- Memory performance attacks: denial of memory service in multi-core systems
Thomas Moscibroda, Onur Mutlu
- Memory scaling: a systems architecture perspective
Onur Mutlu
- Millicode in an IBM zSeries processor
L. C. Heller, M. S. Farrell
- MTIA v1: Meta's first-generation AI inference accelerator
Amin Firoozshahian, Olivia Wu, Joel Coburn, Roman Levenstein
- RAIDR: Retention-Aware Intelligent DRAM Refresh
Jamie Liu, Ben Jaiyen, Richard Veras, Onur Mutlu
- Stall-time fair memory access scheduling for chip multiprocessors
Onur Mutlu, Thomas Moscibroda
Computer Graphics
Data Structures and Algorithms
- Data structures and algorithms in Java
Michael T. Goodrich, Roberto Tamassia, Michael H. Goldwasser
- Introduction to algorithms
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein
Digital Electronics
Graph Theory
Information Theory
- Elements of information theory
Thomas M. Cover, Joy A. Thomas
- Error detecting and error correcting codes
R. W. Hamming
Linear Algebra
- Linear algebra and its applications
Gilbert Strang
- Matrix analysis and applied linear algebra
Carl D. Meyer
- The matrix cookbook
Kaare Brandt Petersen, Michael Syskind Pedersen
Measure Theory
Optimization Theory
- Convex Optimization
Stephen Boyd, Lieven Vandenberghe
- Distributed optimization and statistical learning via the alternating direction method of multipliers
Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, Jonathan Eckstein
Probability and Stochastic Processes
- Introduction to probability and stochastic processes with applications
Liliana Blanco Castaneda, Viswanathan Arunachalam, Delvamuthu Dharmaraja
Quantum Computing
- A fast quantum mechanical algorithm for database search
Lov K. Grover
- A single quantum cannot be cloned
W. K. Wootters, W. H. Zurek
- Can quantum-mechanical description of physical reality be considered complete
Albert Einstein, Boris Podolsky, Nathan Rosen
- Image recognition with an adiabatic quantum computer I. mapping to quadratic unconstrained binary optimization
Hartmut Neven, Geordie Rose, William G. Macready
- Integer optimization toolbox (minimizing polynomials over integer lattices using quantum annealing)
Pooya Ronagh
- Limits on parallel speedup for classical Ising model solvers
- Partitioning optimization problems for hybrid classical/quantum execution
Michael Booth, Steven P. Reinhardt, Aidan Roy
- Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer
Peter W. Shor
- Probabilistic cloning and identification of linearly independent quantum states
Lu-Ming Duan, Guang-Can Guo
- Programming with D-Wave: map coloring problem
E. D. Dahl
- Quantum computation and quantum information
Michael A. Nielsen, Isaac L. Chuang
- Quantum computing: a gentle introduction
Eleanor Rieffel, Wolfgang Polak
- Quantum performance evaluation: a short reading list
- Quantum theory, the Church-Turing principle and the universal quantum computer
David Deutsch
- Rapid solution of problems by quantum computation
David Deutsche, Richard Jozsa
- Teleporting an unknown quantum state via dual classical and Einstein-Podolsky-Rosen channels
Charles H. Bennett, Gilles Brassard, Claude Crépeau, Richard Jozsa, Asher Peres, William K. Wootters