/Papers-books-and-blogs

This repository contains the research papers, white papers, thesis etc that I love.

Primary LanguagePython

This repository contains a list of the books, blogs, research papers and white papers that I have read and found interesting.

Table of contents

AI, DL, NLP and RL

  1. 1-bit Adam: communication efficient large-scale training with Adam’s convergence speed
    Hanlin Tang, Shaoduo Gan, Ammar Ahmad Awan, Samyam Rajbhandari, Conglong Li, Xiangru Lian, Ji Liu, Ce Zhang, Yuxiong He
    image image image
  2. 5 best practices for efficient model training
    Matthew Leavitt, Abhinav Venigalla
    image image image image
  3. 8-bit approximations for parallelism in deep learning
    Tim Dettmers
    image image image image image
  4. 8-bit optimizers via block-wise quantization
    Tim Dettmers, Mike Lewis, Sam Shleifer, Luke Zettlemoyer
    image image image
  5. A 'neural' network that learns to play Backgammon
    Gerald Tesauro, Terrence J. Sejnowski
    image image
  6. A BetterTransformer for fast transformer inference
    Michael Gschwind, Eric Han, Scott Wolchok, Rui Zhu, Christian Puhrsch
    image image image image
  7. A deep reinforced model for abstractive summarization
    Romain Paulus, Caiming Xiong, Richard Socher
    image image image image
  8. A dynamical approach to temporal pattern processing
    W. Scott Stornetta, Tad Hogg, Bernardo A. Huberman
    image image
  9. A few more examples may be worth billions of parameters
    Yuval Kirstain, Patrick Lewis, Sebastian Riedel, Omer Levy
    image image image
  10. A general and adaptive robust loss function
    Jonathan T. Barron
    image image
  11. A gentle introduction to 8-bit matrix multiplication for transformers at scale using Hugging Face transformers, accelerate and bitsandbytes
    Younes Belkada, Tim Dettmers
    image image image image
  12. A note on the evaluation of generative models
    Lucas Theis, Aäron van den Oord, Matthias Bethge
    image image image
  13. A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings
    Mikel Artetxe, Gorka Labaka, Eneko Agirre
    image image image
  14. A simple but tough-to-beat baseline for sentence embeddings
    Sanjeev Arora, Yingyu Liang, Tengyu Ma
    image image image
  15. A simple language model for task-oriented dialogue
    Ehsan Hosseini-Asl, Bryan McCann, Chien-Sheng Wu, Semih Yavuz, Richard Socher
    image image image
  16. A simple neural attentive meta-learner
    Nikhil Mishra, Mostafa Rohaninejad, Xi Chen, Pieter Abbeel
    image image image image
  17. A simple neural network module for relational reasoning
    Adam Santoro, David Raposo, David G.T. Barrett, Mateusz Malinowski, Razvan Pascanu, Peter Battaglia, Timothy Lillicrap
    image image image
  18. A study of BFLOAT16 for deep learning training
    Dhiraj Kalamkar, Dheevatsa Mudigere, Naveen Mellempudi, Dipankar Das, Kunal Banerjee, Sasikanth Avancha, Dharma Teja Vooturi, Nataraj Jammalamadaka, Jianyu Huang, Hector Yuen, Jiyan Yang, Jongsoo Park, Alexander Heinecke, Evangelos Georganas, Sudarshan Srinivasan, Abhisek Kundu, Misha Smelyanskiy, Bharat Kaul, Pradeep Dubey
    image image image
  19. A style-based generator architecture for generative adversarial networks
    Tero Karras, Samuli Laine, Timo Aila
    image image image image
  20. A stylometric inquiry into hyperpartisan and fake news
    Martin Potthast, Johannes Kiesel, Kevin Reinartz, Janek Bevendorff, Benno Stein
    image image image
  21. A3T: adversarially augmented adversarial training
    Akram Erraqabi, Aristide Baratin, Yoshua Bengio, Simon Lacoste-Julien
    image image image image
  22. Accelerated PyTorch 2 transformers
    Michael Gschwind, Driss Guessous, Christian Puhrsch
    image image image image
  23. Accelerating large language model training with variable sparse pre-training and dense fine-tuning
    Abhay Gupta, Mahmoud Salem, Vithursan Thangarasa, Kevin Leong, Sean Lie, Shreyas Saxena
    image image image
  24. Accelerating PyTorch with CUDA graphs
    Vinh Nguyen, Michael Carilli, Sukru Burc Eryilmaz, Vartika Singh, Michelle Lin, Natalia Gimelshein, Alban Desmaison, Edward Yang
    image image image image
  25. AdapterHub: a framework for adapting transformers
    Jonas Pfeiffer, Andreas Rücklé, Clifton Poth, Aishwarya Kamath, Ivan Vulić, Sebastian Ruder, Kyunghyun Cho, Iryna Gurevych
    image image image image
  26. Adversarial approximate inference for speech to electroglottograph conversion
    Prathosh A. P., Varun Srivastava, Mayank Mishra
    image image image image image image
  27. Adversarial autoencoders
    Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, Ian Goodfellow, Brendan Frey
    image image image image
  28. Adversarial examples that fool both computer vision and time-limited humans
    Gamaleldin F. Elsayed, Shreya Shankar, Brian Cheung, Nicolas Papernot, Alex Kurakin, Ian Goodfellow, Jascha Sohl-Dickstein
    image image image
  29. Adversarial feature learning
    Jeff Donahue, Philipp Krähenbühl, Trevor Darrell
    image image image image
  30. Adversarial generation of natural language
    Sai Rajeswar, Sandeep Subramanian, Francis Dutil, Christopher Pal, Aaron Courville
    image image image image
  31. Adversarial information factorization
    Antonia Creswell, Yumnah Mohamied, Biswa Sengupta, Anil A Bharath
    image image image image
  32. Adversarially learned inference
    Vincent Dumoulin, Ishmael Belghazi, Ben Poole, Olivier Mastropietro, Alex Lamb, Martin Arjovsky, Aaron Courville
    image image image image
  33. AlexaTM 20B: few-shot learning using a large-scale multilingual seq2seq model
    Saleh Soltan, Shankar Ananthakrishnan, Jack FitzGerald, Rahul Gupta, Wael Hamza, Haidar Khan, Charith Peris, Stephen Rawls, Andy Rosenbaum, Anna Rumshisky, Chandana Satya Prakash, Mukund Sridhar, Fabian Triefenbach, Apurv Verma, Gokhan Tur, Prem Natarajan
    image image image image
  34. Amazon SageMaker model parallelism: a general and flexible framework for large model training
    Can Karakus, Rahul Huilgol, Fei Wu, Anirudh Subramanian, Cade Daniel, Derya Cavdar, Teng Xu, Haohan Chen, Arash Rahnama, Luis Quintela
    image image image image image
  35. An image is worth 16x16 words: transformers for image recognition at scale
    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby
    image image
  36. An overview of gradient descent optimization algorithms
    Sebastian Ruder
    image image image
  37. Analysing mathematical reasoning abilities of neural models
    David Saxton, Edward Grefenstette, Felix Hill, Pushmeet Kohli
    image image
  38. Approximation by superpositions of sigmoidal function
    George Cybenko
    image image
  39. Artificial Intelligence: a modern approach
    Stuart Russell, Peter Norvig
    image
  40. Aspect based sentiment analysis with gated convolutional networks
    Wei Xue, Tao Li
    image image image
  41. Attention is all you need
    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin
    image image image image
  42. Attention is off by one
    Evan Miller
    image image image
  43. Auto-encoding variational Bayes
    Diederik P. Kingma, Max Welling
    image image image image
  44. Backpropagation through the void: optimizing control variates for black-box gradient estimation
    Will Grathwohl, Dami Choi, Yuhuai Wu, Geoffrey Roeder, David Duvenaud
    image image image image image image image image
  45. BART: denoising sequence-to-sequence pre-training for natural language generation, translation and comprehension
    Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, Luke Zettlemoyer
    image image image
  46. Batch normalization: accelerating deep network training by reducing internal covariate shift
    Sergey Ioffe, Christian Szegedy
    image image image image
  47. Behavioral cloning from observation
    Faraz Torabi, Garrett Warnell, Peter Stone
    image image image image image
  48. BERT: pre-training of deep bidirectional transformers for language understanding
    Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova
    image image image
  49. Beyond domain APIs: Task-oriented conversational modeling with unstructured knowledge access
    Seokhwan Kim, Mihail Eric, Karthik Gopalakrishnan, Behnam Hedayatnia, Yang Liu, Dilek Hakkani-Tur
    image image image
  50. Blockwise parallel transformer for large context models
    Hao Liu, Pieter Abbeel
    image image image image
  51. BLOOM: A 176B-parameter open-access multilingual language model
    Aaron Gokaslan, Abheesht Sharma, Abhinav Ramesh Kashyap, Adam Roberts, Adi Simhi, Ahmed Baruwa, Aitor Soroa, Albert Villanova del Moral, Albert Webson, Alexander M. Rush, Alexandra Sasha Luccioni, Alfredo Palasciano, Alham Fikri Aji, Alice Rueda, Alison Callahan, Amanda Pestana, Amanpreet Singh, Amir Feizpour, Amit Alfassy, Ammar Khan, Amy Faranak, Ana Santos, Anastasia Cheveleva, Andrea Santilli, Angela Fan, Angelina McMillan-Major, Anima Shukla, Anna Rogers, Anne-Laure Ligozat, Anthony Hevia, Antigona Unldreaj, Antoine Chaffin, Antonio Miranda-Escalada, Arash Aghagol, Arezoo Abdollahi, Ariel Kreisberg Nitzav, Arjun Subramonian, Arnaud Stiegler, Arun Raja, Aurélie Névéol, Aycha Tammour, Ayush Singh, Azadeh HajiHosseini, Bahareh Behroozi, Benjamin Ajibade, Benjamin Beilharz, Benjamin Heinzerling, Benoît Sagot, Bharat Saxena, Bo Wang, Caio Brito, Canwen Xu, Carlos Muñoz Ferrandis, Charles Lovering, Chenghao Mou, Chenglei Si, Chenxi Zhou, Chirag Jain, Chris Emezue, Christopher Akiki, Christopher Klamm, Chuxin Xu, Clémentine Fourrier, Colin Leong, Colin Raffel, Conglong Li, Dan Garrette, Daniel Hesslow, Daniel León Periñán, Daniel Molano, Daniel van Strien, Danish Contractor, David Ifeoluwa Adelani, David Lansky, Davis David, Davut Emre Taşar, Debajyoti Datta, Deepak Narayanan, Deepak Tunuguntla, Dian Yu, Douwe Kiela, Dragomir Radev, Duong A. Nguyen, Eduardo González Ponferrada, Edward Tan, Efrat Levkovizh, Ehud Reiter, Ekaterina Taktasheva, Ekaterina Voloshina, Eli Bogdanov, Eliza Szczechla, Elizabeth Salesky, Ellie Pavlick, Emi Baylor, Enrique Manjavacas, Ethan Kim, Eyal Bar Natan, Ezinwanne Ozoani, Fabio Barth, Fatima Mirza, Florian Fuhrimann, Francesco De Toni, Frankline Ononiwu, François Yvon, Gabriel Altay, Genta Indra Winata, Germán Kruszewski, Giada Pistilli, Giyaseddin Bayrak, Gully Burns, Gunjan Chhablani, Gérard Dupont, Habib Rezanejad, Hadar Tojarieh, Hady Elsahar, Hailey Schoelkopf, Hamza Benyamina, Han Wang, Harshit Pandey, Hatim Bourfoune, Helena U. Vrabec, Hendrik Strobelt, Hessie Jones, Hieu Tran, Hugo Laurençon, Huu Nguyen, Hyung Won Chung, Ian Yu, Idris Abdulmumin, Imane Bello, Indrani Bhattacharya, Irene Solaiman, Irina Sedenko, Isaac Johnson, Isar Nejadgholi, Ishani Dash, Itziar Gonzalez-Dios, Iz Beltagy, Jaesung Tae, Jan-Christoph Kalo, Jared Casper, Jason Alan Fries, Jason Phang, Javier de la Rosa, Jeff Rasley, Jekaterina Novikova, Jenny Chim, Jesse Dodge, Jesse Passmore, Jessica Zosa Forde, Jian Zhu, Jihyun Kang, John Giorgi, Jonas Golde, Jonathan Chang, Jonathan Tow, Jordan Clive, Jos Rozen, Jose David Posada, Joseph Tobing, Josh Seltzer, Joydeep Bhattacharjee, Julien Launay, Julio Bonis Sanz, Jungo Kasai, Jörg Frohberg, Karthik Rangasai Sivaraman, Ken Kawamura, Khalid Almubarak, Kimbo Chen, Kyle Lo, Leandro Von Werra, Leo Gao, Leon Weber, Liam Hazan, Lintang Sutawika, Livia Dutra, Lokesh Bulchandani, Long Phan, Loubna Ben allal, Lu Liu, Lucile Saulnier, Ludovic Tanguy, Luisa Shinzato, M Saiful Bari, Madeleine Hahn de Bykhovetz, Maged S. Al-shaibani, Maiko Takeuchi, Mairon Samagaio, Manan Dey, Manuel Romero Muñoz, Maraim Elbadri, Maraim Masoud, Marc Pàmies, Margaret Mitchell, Margot Mieskes, Maria A Castillo, Marianna Nezhurina, Marine Carpuat, Mario Sänger, Mario Šaško, Marissa Gerchick, Martha Akinlolu, María Grandury, Mathilde Bras, Matteo Manica, Matthias Gallé, Matthias Samwald, Max Huang, Max Ryabinin, Maximin Coavoux, Mayank Mishra, Mayank Singh, Michael Cullan, Michael McKenna, Michael Weinberg, Michiel De Wolf, Mike Qiu, Mike Tian-Jian Jiang, Mina Mihaljcic, Minh Chien Vu, Minjia Zhang, Minna Liu, Miruna Clinciu, Mohammad A. Jauhar, Mohammad Shoeybi, Moritz Freidank, Muhammed Ghauri, Mustafa Ghaleb, Mykola Burynok, Myriam Peyrounette, Myungsun Kang, Nafis Abrar, Najoung Kim, Natasha Seelam, Nathan Dahlberg, Nazneen Rajani, Newton Cheng, Nicholas Michio Broad, Nicolas Patry, Nihal Nayak, Niklas Muennighoff, Nikolaus Muellner, Nishant Subramani, Nora Kassner, Nouamane Tazi, Nour Elkott, Nour Fahmy, Nurulaqilla Khamis, Ofir Press, Olanrewaju Samuel, Olatunji Ruwase, Oleg Serikov, Olivier Nguyen, Omar Espejel, Omar Sanseviero, Omer Antverg, Ona de Gibert, Oskar van der Wal, Pascale Fung, Patrick Haller, Patrick von Platen, Paulo Villegas, Pawan Sasanka Ammanamanchi, Pedro Ortiz Suarez, Peter Henderson, Pierre Colombo, Pierre Cornette, Pierre François Lavallée, Priscilla Amuok, Quentin Lhoest, Rachel Bawden, Ramya Chandrasekhar, Ran An, Rasmus Kromann, Renata Eisenberg, Rheza Harliman, Rishi Bommasani, Robert Martin, Roberto Luis López, Rodrigo Canalli, Roman Castagné, Rosaline Su, Rui Ribeiro, Rui Zhang, Ruisi Su, Ruochen Zhang, Ryan Hao, Ryan Teehan, Rémi Lacroix, Sabrina J. Mielke, Salomey Osei, Samira Alizadeh, Sampo Pyysalo, Samson Tan, Samuel Albanie, Samuel Cahyawijaya, Samuele Garda, Samyam Rajbhandari, Sanchit Gandhi, Sarmad Shubber, Sebastian Gehrmann, Sebastian Nagel, Shachar Mirkin, Shaden Smith, Shaked Brody, Shamik Bose, Shamsuddeen Hassan Muhammad, Shani Pais, Shanya Sharma, Shayne Longpre, Sheng Shen, Shlok S Deshmukh, Shubhanshu Mishra, Sid Kiblawi, Silas Wang, Simon Ott, Sinee Sang-aroonsiri, Somaieh Nikpoor, Sourav Roy, Srishti Kumar, Srulik Ben-David, Stanislav Silberberg, Stas Bekman, Stefan Schweter, Stella Biderman, Stephen H. Bach, Stéphane Requena, Suhas Pai, Suraj Patil, Sushil Bharati, Suzana Ilić, Sydney Zink, Sylvain Viguier, Taewoon Kim, Tali Bers, Tanmay Laud, Tatiana Shavrina, Teven Le Scao, Thanh Le, Thibault Fevry, Thomas Scialom, Thomas Wang, Thomas Wolf, Théo Gigant, Tiago Timponi Torrent, Tian Yun, Tim Dettmers, Timo Schick, Tobi Oyebade, Tomasz Limisiewicz, Tomoya Kainuma, Trieu Le, Trishala Neeraj, Tristan Thrush, Urmish Thakker, Valentin Danchev, Vassilina Nikoulina, Verena Rieser, Veronika Laippala, Victor Sanh, Vikas Raunak, Violette Lepercq, Vitaly Protasov, Vladislav Mikhailov, Vrinda Prabhu, Wilson Y. Lee, Wojciech Kusa, Xiangru Tang, Yacine Jernite, Yada Pruksachatkun, Yallow Uri, Yanis Labrak, Yash Shailesh Bajaj, Yash Venkatraman, Yifan Xu, Yingxin Xu, Yonatan Belinkov, Younes Belkada, Yoyo Yang, Yu Xu, Zach Nguyen, Zachary Bamberger, Zaid Alyafeai, Zdeněk Kasner, Zeerak Talat, Zhe Tan, Zheng-Xin Yong, Zhiqing Sun, Zhongli Xie, Zifan Ye
    image image image image image
  52. Bootstrapping entity alignment with knowledge graph embedding
    Zequn Sun, Wei Hu, Qingheng Zhang, Yuzhong Qu
    image image image image
  53. Bridging the gap between prior and posterior knowledge selection for knowledge-grounded dialogue generation
    Xiuyi Chen, Fandong Meng, Peng Li, Feilong Chen, Shuang Xu, Bo Xu, Jie Zhou
    image image image image
  54. Bringing open large language models to consumer devices
    MLC Community
    image image image image
  55. BTLM-3B-8K: 7B performance in a 3 billion parameter model
    Nolan Dey, Daria Soboleva, Faisal Al-Khateeb, Ribhu Pathria, Hemant Khachane, Shaheer Muhammad, Zhiming (Charles) Chen, Bowen Yang, Siyun Li, Abhay Gupta, Shreyas Saxena, Robert Myers, Jacob Robert Steeves, Marvin Tom, Joel Hestness
    image image image image
  56. Building blocks for a complex-valued transformer architecture
    Florian Eilers, Xiaoyi Jiang
    image image image image
  57. ChatGPT: optimizing language models for dialogue
    image image image
  58. ColBERT: efficient and effective passage search via contextualized late interaction over BERT
    Omar Khattab, Matei Zaharia
    image image image
  59. Colossal-AI: a unified deep learning system for large-scale parallel training
    Zhengda Bian, Hongxin Liu, Boxiang Wang, Haichen Huang, Yongbin Li, Chuanrui Wang, Fan Cui, Yang You
    image image image image image
  60. Compiling machine learning programs via high-level tracing
    Roy Frostig, Matthew Johnson, Chris Leary
    image image image
  61. Complex transformer: a framework for modeling complex-valued sequence
    Muqiao Yang, Martin Q. Ma, Dongyu Li, Yao-Hung Hubert Tsai, Ruslan Salakhutdinov
    image image image image
  62. Conceptual captions: a cleaned, hypernymed, image alt-text dataset for automatic image captioning
    Piyush Sharma, Nan Ding, Sebastian Goodman, Radu Soricut
    image image image image
  63. Conditional image synthesis with auxilliary classifier GANs
    Augustus Odena, Christopher Olah, Jonathon Shlens
    image image image image
  64. Conformal nucleus sampling
    Shauli Ravfogel, Yoav Goldberg, Jacob Goldberger
    image image image
  65. Connecting large language models with evolutionary algorithms yields powerful prompt optimizers
    Qingyan Guo, Rui Wang, Junliang Guo, Bei Li, Kaitao Song, Xu Tan, Guoqing Liu, Jiang Bian, Yujiu Yang
    image image image image
  66. Connectivity versus entropy
    Yaser S. Abu-Mostafa
    image image
  67. Constituency parsing with a self-attentive encoder
    Nikita Kitaev, Dan Klein
    image image image
  68. Constraint based knowledge base distillation in end-to-end task oriented dialogs
    Dinesh Raghu, Atishya Jain, Mausam, Sachindra Joshi
    image image image
  69. Context generation improves open domain question answering
    Dan Su, Mostofa Patwary, Shrimai Prabhumoye, Peng Xu, Ryan Prenger, Mohammad Shoeybi, Pascale Fung, Anima Anandkumar, Bryan Catanzaro
    image image image image
  70. Convert transformers to ONNX with hugging face optimum
    Philipp Schmid
    image image image image
  71. Convolutional networks for graphs for learning molecular fingerprints
    David K. Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Alan Aspuru-Guzik, Ryan P. Adams
    image image image
  72. Convolutional neural network language models
    Ngoc-Quan Pham, Germán Kruszewski, Gemma Boleda
    image image
  73. Countering adversarial images using input transformations
    Chuan Guo, Mayank Rana, Moustapha Cisse, Laurens van der Maaten
    image image image image
  74. Cramming: training a language model on a single GPU in one day
    Jonas Geiping, Tom Goldstein
    image image image
  75. Crosslingual generalization through multitask finetuning
    Niklas Muennighoff, Thomas Wang, Lintang Sutawika, Adam Roberts, Stella Biderman, Teven Le Scao, M Saiful Bari, Sheng Shen, Zheng-Xin Yong, Hailey Schoelkopf, Xiangru Tang, Dragomir Radev, Alham Fikri Aji, Khalid Almubarak, Samuel Albanie, Zaid Alyafeai, Albert Webson, Edward Raff, Colin Raffel
    image image image image image image
  76. Curriculum learning
    Yoshua Bengio, Jérôme Louradour, Ronan Collobert, Jason Weston
    image image image
  77. Cutting down on prompts and parameters: simple few-shot learning with language models
    Robert L. Logan IV, Ivana Balažević, Eric Wallace, Fabio Petroni, Sameer Singh, Sebastian Riedel
    image image image image
  78. Deep Boltzmann machines
    Ruslan Salakhutdinov, Geoffrey Hinton
    image image image image
  79. Deep complex networks
    Chiheb Trabelsi, Olexa Bilaniuk, Ying Zhang, Dmitriy Serdyuk, Sandeep Subramanian, João Felipe Santos, Soroush Mehri, Negar Rostamzadeh, Yoshua Bengio, Christopher J Pal
    image image image
  80. Deep learning
    Ian Goodfellow, Yoshua Bengio, Aaron Courville
    image
  81. Deep learning and the information bottleneck principle
    Naftali Tishby, Noga Zaslavsky
    image image image
  82. Deep learning techniques for super-resolution in video games
    Alexander Watson
    image image image image image
  83. Deep residual learning for image recognition
    Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
    image image image
  84. Deep text classification can be fooled
    Bin Liang, Hongcheng Li, Miaoqiang Su, Pan Bian, Xirong Li, Wenchang Shi
    image image image
  85. DeepSpeed compression: a composable library for extreme compression and zero-cost quantization
    DeepSpeed Team, Andrey Proskurin
    image image image image
  86. DeepSpeed Inference: enabling efficient inference of transformer models at unprecedented scale
    Reza Yazdani Aminabadi, Samyam Rajbhandari, Minjia Zhang, Ammar Ahmad Awan, Cheng Li, Du Li, Elton Zheng, Jeff Rasley, Shaden Smith, Olatunji Ruwase, Yuxiong He
    image image image image
  87. DeepSpeed powers 8x larger MoE model training with high performance
    DeepSpeed Team, Z-code Team
    image image image image image
  88. DeepSpeed: accelerating large-scale model inference and training via system optimizations and compression
    DeepSpeed Team, Rangan Majumder, Andrey Proskurin
    image image image image
  89. DeepSpeed: advancing MoE inference and training to power next-generation AI scale
    DeepSpeed Team, Andrey Proskurin
    image image image image image
  90. Denoising distantly supervised open-domain question answering
    Yankai Lin, Haozhe Ji, Zhiyuan Liu, Maosong Sun
    image image image
  91. Diffusion convolutional recurrent neural network: data-driven traffic forecasting
    Yaguang Li, Rose Yu, Cyrus Shahabi, Yan Liu
    image image image image image
  92. Discrete variational autoencoders
    Jason Tyler Rolfe
    image image image image
  93. Disentangling by factorising
    Hyunjik Kim, Andriy Mnih
    image image image image image
  94. Disentangling language and knowledge in task-oriented dialogs
    Dinesh Raghu, Nikhil Gupta, Mausam
    image image image image
  95. Distributionally robust language modeling
    Yonatan Oren, Shiori Sagawa, Tatsunori B. Hashimoto, Percy Liang
    image image image image
  96. Editing models with task arithmetic
    Gabriel Ilharco, Marco Tulio Ribeiro, Mitchell Wortsman, Suchin Gururangan, Ludwig Schmidt, Hannaneh Hajishirzi, Ali Farhadi
    image image image image
  97. Efficient estimation of word representations in vector space
    Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean
    image image image
  98. Efficient large scale language modeling with mixtures of experts
    Mikel Artetxe, Shruti Bhosale, Naman Goyal, Todor Mihaylov, Myle Ott, Sam Shleifer, Xi Victoria Lin, Jingfei Du, Srinivasan Iyer, Ramakanth Pasunuru, Giri Anantharaman, Xian Li, Shuohui Chen, Halil Akin, Mandeep Baines, Louis Martin, Xing Zhou, Punit Singh Koura, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Mona Diab, Zornitsa Kozareva, Ves Stoyanov
    image image image image image
  99. Efficient large-scale language model training on GPU clusters using Megatron-LM
    Deepak Narayanan, Mohammad Shoeybi, Jared Casper, Patrick LeGresley, Mostofa Patwary, Vijay Anand Korthikanti, Dmitri Vainbrand, Prethvi Kashinkunti, Julie Bernauer, Bryan Catanzaro, Amar Phanishayee, Matei Zaharia
    image image image image image
  100. Enchancing the reliability of out-of-distribution image detection in neural networks
    Shiyu Liang, Yixuan Li, R. Srikant
    image image image
  101. End-to-end task-oriented dialog modeling with semi-structured knowledge management
    Silin Gao, Ryuichi Takanobu, Antoine Bosselut, Minlie Huang
    image image image
  102. Ensemble adversarial training: attacks and defenses
    Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, Patrick McDaniel
    image image image image
  103. Equilibrium propagation: bridging the gap between energy-based models and backpropagation
    Benjamin Scellier, Yoshua Bengio
    image image image image
  104. Estimating or propagating gradients through stochastic neurons for conditional computation
    Yoshua Bengio, Nicholas Léonard, Aaron Courville
    image image image image
  105. Exemplar encoder-decoder for neural conversation generation
    Gaurav Pandey, Danish Contractor, Vineet Kumar, Sachindra Joshi
    image image image
  106. Expert human-level driving in gran turismo sport using deep reinforcement learning with image-based representation
    Ryuji Imamura, Takuma Seno, Kenta Kawamoto, Michael Spranger
    image image
  107. Exploring deep recurrent models with reinforcement learning for molecule design
    Daniel Neil, Marwin Segler, Laura Guasch, Mohamed Ahmed, Dean Plumbley, Matthew Sellwood, Nathan Brown
    image image image
  108. Exploring the limits of transfer learning with a unified text-to-text transformer
    Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu
    image image image
  109. Extreme compression for pre-trained transformers made simple and efficient
    Xiaoxia Wu, Zhewei Yao, Minjia Zhang, Conglong Li, Yuxiong He
    image image image image image image image
  110. Fast abstractive summarization with reinforce-selected sentence rewriting
    Yen-Chun Chen, Mohit Bansal
    image image image image
  111. Fast transformer decoding: one write-head is all you need
    Noam Shazeer
    image image image image
  112. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning
    Haokun Liu, Derek Tam, Mohammed Muqeeth, Jay Mohta, Tenghao Huang, Mohit Bansal, Colin Raffel
    image image image
  113. FFJORD: Free-form continuous dynamics for scalable reversible generative models
    Will Grathwohl, Ricky T. Q. Chen, Jesse Bettencourt, Ilya Sutskever, David Duvenaud
    image image image
  114. Finetuned language models are zero-shot learners
    Jason Wei, Maarten Bosma, Vincent Y. Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, Quoc V. Le
    image image image image
  115. Flash-decoding for long-context inference
    Tri Dao, Daniel Haziza, Francisco Massa, Grigory Sizov
    image image image image
  116. FlashAttention: fast and memory-efficient exact attention with IO-awareness
    Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, Christopher Ré
    image image image image
  117. FlashAttention: fast transformer training with long sequences
    Tri Dao
    image image image image
  118. Foundations of NLP explained visually: beam search, how it works
    Ketan Doshi
    image image image
  119. Generating adversarial examples with adversarial networks
    Chaowei Xiao, Bo Li, Jun-yan Zhu, Warren He, Mingyan Liu, Dawn Song
    image image image image image
  120. Generating sentences from a continuous space
    Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz, Samy Bengio
    image image
  121. Generation-augmented retrieval for open-domain question answering
    Yuning Mao, Pengcheng He, Xiaodong Liu, Yelong Shen, Jianfeng Gao, Jiawei Han, Weizhu Chen
    image image image
  122. Generative adversarial nets
    Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio
    image image image image
  123. Genetic algorithms in search, optimization and machine learning
    David E. Goldberg
    image
  124. GeoMAN: multi-level attention networks for geo-sensory time series prediction
    Yuxuan Liang, Songyu Ke, Junbo Zhang, Xiuwen Yi, Yu Zheng
    image image image image
  125. Getting the most out of the NVIDIA A100 GPU with Multi-Instance GPU
    Maggie Zhang, James Sohn, Chetan Tekur
    image image image
  126. GLaM: efficient scaling of language models with mixture-of-experts
    Nan Du, Yanping Huang, Andrew M. Dai, Simon Tong, Dmitry Lepikhin, Yuanzhong Xu, Maxim Krikun, Yanqi Zhou, Adams Wei Yu, Orhan Firat, Barret Zoph, Liam Fedus, Maarten Bosma, Zongwei Zhou, Tao Wang, Yu Emma Wang, Kellie Webster, Marie Pellat, Kevin Robinson, Kathleen Meier-Hellstern, Toju Duke, Lucas Dixon, Kun Zhang, Quoc V Le, Yonghui Wu, Zhifeng Chen, Claire Cui
    image image image image image
  127. GLM-130B: an open bilingual pre-trained model
    image image image image
  128. GLU variants improve transformer
    Noam Shazeer
    image image image
  129. Going deeper with convolutions
    Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich
    image image image
  130. GPT-4 architecture, infrastructure, training dataset, costs, vision, MoE
    Dylan Patel, Gerald Wong
    image image image image image image image
  131. GPT-NeoX-20B: an open-source autoregressive language model
    Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach
    image image image image
  132. GQA: training generalized multi-query transformer models from multi-head checkpoints
    Joshua Ainslie, James Lee-Thorp, Michiel de Jong, Yury Zemlyanskiy, Federico Lebrón, Sumit Sanghai
    image image image image
  133. Gradient-based hyperparameter optimization through reversible learning
    Dougal Maclaurin, David Duvenaud, Ryan P. Adams
    image image image
  134. Graph attention networks
    Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, Yoshua Bengio
    image image image
  135. Hierarchical neural story generation
    Angela Fan, Mike Lewis, Yann Dauphin
    image image image
  136. Hindsight: posterior-guided training of retrievers for improved open-ended generation
    Ashwin Paranjape, Omar Khattab, Christopher Potts, Matei Zaharia, Christopher D. Manning
    image image image image
  137. HotFlip: white-box adversarial examples for text classification
    Javid Ebrahimi, Anyi Rao, Daniel Lowd, Dejing Dou
    image image image
  138. How big should my language model be?
    Teven Le Scao
    image image image image
  139. How should AI systems behave, and who should decide?
    image image image
  140. How we sped up transformer inference 100x for 🤗 API customers
    image image image
  141. How 🤗 Accelerate runs very large models thanks to PyTorch
    Sylvain Gugger
    image image image image image
  142. HyKnow: end-to-end task-oriented dialog modeling with hybrid knowledge management
    Silin Gao, Ryuichi Takanobu, Wei Peng, Qun Liu, Minlie Huang
    image image image
  143. Hyperparameter search with Transformers and Ray Tune
    image image image image
  144. Image-to-image translation with conditional generative adversarial networks
    Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros
    image
  145. ImageNet classification using deep convolutional neural networks
    Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton
    image image image
  146. Improving entity linking by modeling latent relations between mentions
    Phong Le, Ivan Titov
    image image image
  147. Improving language models by retrieving from trillions of tokens
    Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George van den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, Diego de Las Casas, Aurelia Guy, Jacob Menick, Roman Ring, Tom Hennigan, Saffron Huang, Loren Maggiore, Chris Jones, Albin Cassirer, Andy Brock, Michela Paganini, Geoffrey Irving, Oriol Vinyals, Simon Osindero, Karen Simonyan, Jack W. Rae, Erich Elsen, Laurent Sifre
    image image image image image
  148. Improving language understanding by generative pre-training
    Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever
    image image image
  149. Improving reinforcement learning from human feedback with efficient reward model ensemble
    Shun Zhang, Zhenfang Chen, Sunli Chen, Yikang Shen, Zhiqing Sun, Chuang Gan
    image image image image
  150. Incredibly fast BLOOM inference with DeepSpeed and Accelerate
    Stas Bekman, Sylvain Gugger
    image image image image
  151. Inference suboptimality in variational autoencoders
    Chris Cremer, Xuechen Li, David Duvenaud
    image image image image
  152. InfoGAN: interpretable representation learning by information maximizing generative adversarial nets
    Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, Pieter Abbeel
    image image image image image
  153. Interpretable convolutional neural networks via feedforward design
    C.-C. Jay Kuo, Min Zhang, Siyang Li, Jiali Duan, Yueru Chen
    image image
  154. Introducing MPT-7B: a new standard for open-source, commercially usable LLMs
    The MosaicML NLP Team
    image image image image
  155. Introducing nvFuser, a deep learning compiler for PyTorch
    Christian Sarofeen, Piotr Bialecki, Jie Jiang, Kevin Stephano, Masaki Kozuki, Neal Vaidya, Stas Bekman
    image image image image image
  156. Introducing Turing image super resolution: AI powered image enhancements for Microsoft Edge and Bing maps
    image image image
  157. Introducing 🤗 accelerate
    Sylvain Gugger
    image image image image
  158. Is ChatGPT 175 billion parameters? Technical analysis
    Oren Leung
    image image image
  159. Is the future of neural networks Sparse? An introduction (1/N)
    François Lagunas
    image image image
  160. Joint reasoning on hybrid-knowledge sources for task-oriented dialog
    Mayank Mishra, Danish Contractor, Dinesh Raghu
    image image image
  161. Know what you don't know: unanswerable questions for SQuAD
    Pranav Rajpurkar, Robin Jia, Percy Liang
    image image image
  162. Knowledge-grounded dialogue generation with pre-trained language models
    Xueliang Zhao, Wei Wu, Can Xu, Chongyang Tao, Dongyan Zhao, Rui Yan
    image image image
  163. Language is not all you need: aligning perception with language models
    Shaohan Huang, Li Dong, Wenhui Wang, Yaru Hao, Saksham Singhal, Shuming Ma, Tengchao Lv, Lei Cui, Owais Khan Mohammed, Barun Patra, Qiang Liu, Kriti Aggarwal, Zewen Chi, Johan Bjorck, Vishrav Chaudhary, Subhojit Som, Xia Song, Furu Wei
    image image image
  164. Language modeling with gated convolutional networks
    Yann N. Dauphin, Angela Fan, Michael Auli, David Grangier
    image image image
  165. Language modelling with pixels
    Phillip Rust, Jonas F. Lotz, Emanuele Bugliarello, Elizabeth Salesky, Miryam de Lhoneux, Desmond Elliott
    image image image
  166. Language models (mostly) know what they know
    Saurav Kadavath, Tom Conerly, Amanda Askell, Tom Henighan, Dawn Drain, Ethan Perez, Nicholas Schiefer, Zac Hatfield-Dodds, Nova DasSarma, Eli Tran-Johnson, Scott Johnston, Sheer El-Showk, Andy Jones, Nelson Elhage, Tristan Hume, Anna Chen, Yuntao Bai, Sam Bowman, Stanislav Fort, Deep Ganguli, Danny Hernandez, Josh Jacobson, Jackson Kernion, Shauna Kravec, Liane Lovitt, Kamal Ndousse, Catherine Olsson, Sam Ringer, Dario Amodei, Tom Brown, Jack Clark, Nicholas Joseph, Ben Mann, Sam McCandlish, Chris Olah, Jared Kaplan
    image image image
  167. Language models are unsupervised multitask learners
    Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever
    image image image
  168. Large language models are not fair evaluators
    Peiyi Wang, Lei Li, Liang Chen, Zefan Cai, Dawei Zhu, Binghuai Lin, Yunbo Cao, Qi Liu, Tianyu Liu, Zhifang Sui
    image image image
  169. Layer normalization
    Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton
    image image image image
  170. Learning activation functions to improve deep neural networks
    Forest Agostinelli, Matthew Hoffman, Peter Sadowski, Pierre Baldi
    image image
  171. Learning discourse-level diversity for neural dialog models using conditional variational autoencoders
    Tiancheng Zhao, Ran Zhao, Maxine Eskenazi
    image image image image
  172. Learning on a general network
    Amir F. Atiya
    image image image
  173. Learning representations by back-propagating errors
    David E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams
    image image image
  174. Learning transferable visual models from natural language supervision
    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever
    image image
  175. Learning word embeddings efficiently with noise-contrastive estimation
    Andriy Mnih, Koray Kavukcuoglu
    image image image
  176. Lessons learned on language model safety and misuse
    image image image
  177. Lifelong language pretraining with distribution-specialized experts
    Wuyang Chen, Yanqi Zhou, Nan Du, Yanping Huang, James Laudon, Zhifeng Chen, Claire Cu
    image image image image
  178. Linear scaling made possible with weight streaming
    Andrew Feldman
    image image image image image
  179. Linformer: self-attention with linear complexity
    Sinong Wang, Belinda Z. Li, Madian Khabsa, Han Fang, Hao Ma
    image image image image
  180. LLM in a flash: efficient large language model inference with limited memory
    Keivan Alizadeh, Iman Mirzadeh, Dmitry Belenko, Karen Khatamifard, Minsik Cho, Carlo C Del Mundo, Mohammad Rastegari, Mehrdad Farajtabar
    image image image image
  181. LLM.int8(): 8-bit matrix multiplication for transformers at scale
    Tim Dettmers, Mike Lewis, Younes Belkada, Luke Zettlemoyer
    image image image
  182. Long sequence modeling with XGen: a 7B LLM trained on 8K input sequence length
    Erik Nijkamp, Hiroaki Hayashi, Tian Xie, Congying Xia, Bo Pang, Rui Meng, Wojciech Kryscinski, Lifu Tu, Meghana Bhat, Semih Yavuz, Chen Xing, Jesse Vig, Lidiya Murakhovs'ka, Chien-Sheng Wu, Yingbo Zhou, Shafiq Rayhan Joty, Caiming Xiong, Silvio Savarese
    image image image image
  183. LoRA: Low-Rank Adaptation of large language models
    Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen
    image image image
  184. Lost in the middle: how language models use long contexts
    Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, Percy Liang
    image image image
  185. M6-10T: a sharing-delinking paradigm for efficient multi-trillion parameter pretraining
    Junyang Lin, An Yang, Jinze Bai, Chang Zhou, Le Jiang, Xianyan Jia, Ang Wang, Jie Zhang, Yong Li, Wei Lin, Jingren Zhou, Hongxia Yang
    image image image image image image image image
  186. Machine learning
    Tom M. Mitchell
    image
  187. Machine learning: a probabilistic perspective
    Kevin P. Murphy
    image
  188. Making deep learning go brrrr from first principles
    Horace He
    image image image
  189. Making DeepSpeed ZeRO run efficiently on more-affordable hardware
    Justin Chiu, Shuai Zheng
    image image image image
  190. Mask & focus: conversation modelling by learning concepts
    Gaurav Pandey, Dinesh Raghu, Sachindra Joshi
    image image image
  191. Maximizing communication efficiency for large-scale training via 0/1 Adam
    Yucheng Lu, Conglong Li, Minjia Zhang, Christopher De Sa, Yuxiong He
    image image image
  192. MCR-DL: mix-and-match communication runtime for deep learning
    Quentin Anthony, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar Panda
    image image image image
  193. MegaBlocks: efficient sparse training with mixture-of-experts
    Trevor Gale, Deepak Narayanan, Cliff Young, Matei Zaharia
    image image image image
  194. Megatron-LM: training multi-billion parameter language models using model parallelism
    Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, Bryan Catanzaro
    image image image image image image
  195. Memory-efficient pipeline-parallel DNN training
    Deepak Narayanan, Amar Phanishayee, Kaiyu Shi, Xie Chen, Matei Zaharia
    image image image image image image
  196. MinTL: minimalist transfer learning for task-oriented dialogue systems
    Zhaojiang Lin, Andrea Madotto, Genta Indra Winata, Pascale Fung
    image image image
  197. Mix and match: learning-free controllable text generation using energy language models
    Fatemehsadat Mireshghallah, Kartik Goyal, Taylor Berg-Kirkpatrick
    image image image
  198. Mixed precision training
    Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, Hao Wu
    image image image
  199. Mixture of attention heads: selecting attention heads per token
    Xiaofeng Zhang, Yikang Shen, Zeyu Huang, Jie Zhou, Wenge Rong, Zhang Xiong
    image image image image image
  200. Mixture-of-Experts meets instruction tuning: a winning combination for large language models
    Sheng Shen, Le Hou, Yanqi Zhou, Nan Du, Shayne Longpre, Jason Wei, Hyung Won Chung, Barret Zoph, William Fedus, Xinyun Chen, Tu Vu, Yuexin Wu, Wuyang Chen, Albert Webson, Yunxuan Li, Vincent Zhao, Hongkun Yu, Kurt Keutzer, Trevor Darrell, Denny Zhou
    image image image image
  201. mixup: beyond empirical risk minimization
    Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, David Lopez-Paz
    image image image image image image
  202. MMCoQA: conversational question answering over text, tables and images
    Yongqi Li, Wenjie Li, Liqiang Nie
    image image image image image
  203. Mode matching in GANs through latent space learning and inversion
    Deepak Mishra, Prathosh A. P., Aravind Jayendran, Varun Srivastava, Santanu Chaudhury
    image image image image
  204. Multi-level memory for task oriented dialogs
    Revanth Reddy, Danish Contractor, Dinesh Raghu, Sachindra Joshi
    image image image
  205. Multitask prompt tuning enables parameter-efficient transfer learning
    Zhen Wang, Rameswar Panda, Leonid Karlinsky, Rogerio Feris, Huan Sun, Yoon Kim
    image image image
  206. MultiWOZ - A large-scale multi-domain Wizard-of-Oz dataset for task-oriented dialogue modelling
    Paweł Budzianowski, Tsung-Hsien Wen, Bo-Hsiang Tseng, Iñigo Casanueva, Stefan Ultes, Osman Ramadan, Milica Gašić
    image image image image
  207. Mutual information neural estimation
    Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeswar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, R Devon Hjelm
    image image image
  208. NeMo: a toolkit for building AI applications using neural modules
    Oleksii Kuchaiev, Jason Li, Huyen Nguyen, Oleksii Hrinchuk, Ryan Leary, Boris Ginsburg, Samuel Kriman, Stanislav Beliaev, Vitaly Lavrukhin, Jack Cook, Patrice Castonguay, Mariya Popova, Jocelyn Huang, Jonathan M. Cohen
    image image image image
  209. Neural GPUs learn algorithms
    Łukasz Kaiser, Ilya Sutskever
    image image image
  210. Neural network methods for natural language processing
    Yaov Goldberg
    image
  211. Neural networks and physical systems with emergent collective computational abilities
    J. J. Hopfield
    image image image image
  212. Neural networks for pattern recognition
    Christopher M. Bishop
    image
  213. Neural ordinary differential equations
    Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, David Duvenaud
    image image image
  214. No train no gain: revisiting efficient training algorithms for transformer-based language models
    Jean Kaddour, Oscar Key, Piotr Nawrot, Pasquale Minervini, Matt J. Kusner
    image image image
  215. Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples
    Anish Athalye, Nicholas Carlini, David Wagner
    image image image
  216. OctoPack: instruction tuning code large language models
    Niklas Muennighoff, Qian Liu, Armel Zebaze, Qinkai Zheng, Binyuan Hui, Terry Yue Zhuo, Swayam Singh, Xiangru Tang, Leandro von Werra, Shayne Longpre
    image image image image
  217. On the convergence of Adam and beyond
    Sashank J. Reddi, Satyen Kale, Sanjiv Kumar
    image image image image
  218. On the power of neural networks for solving hard problems
    Jehoshua Bruck, Joseph W. Goodman
    image image image
  219. One model to learn them all
    Lukasz Kaiser, Aidan N. Gomez, Noam Shazeer, Ashish Vaswani, Niki Parmar, Llion Jones, Jakob Uszkoreit
    image image image
  220. Open domain question answering over tables via dense retrieval
    Jonathan Herzig, Thomas Müller, Syrine Krichene, Julian Eisenschlos
    image image image
  221. Open question answering over tables and text
    Wenhu Chen, Ming-Wei Chang, Eva Schlinger, William Yang Wang, William W. Cohen
    image image image
  222. OPT: open pre-trained transformer language models
    Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, Luke Zettlemoyer
    image image image image
  223. Optimal brain compression: a framework for accurate post-training quantization and pruning
    Elias Frantar, Sidak Pal Singh, Dan Alistarh
    image image image image
  224. Optimal perceptual inference
    Geoffrey E. Hinton, Terrence J. Sejnowski
    image image image
  225. Optimization story: Bloom inference
    Nicolas Patry
    image image image image
  226. Orca 2: teaching small language models how to reason
    Arindam Mitra, Luciano Del Corro, Shweti Mahajan, Andres Codas, Clarisse Simoes, Sahaj Agarwal, Xuxi Chen, Anastasia Razdaibiedina, Erik Jones, Kriti Aggarwal, Hamid Palangi, Guoqing Zheng, Corby Rosset, Hamed Khanpour, Ahmed Awadallah
    image image image
  227. Orca: progressive learning from complex explanation traces of GPT-4
    Subhabrata Mukherjee, Arindam Mitra, Ganesh Jawahar, Sahaj Agarwal, Hamid Palangi, Ahmed Awadallah
    image image image
  228. Outer product-based neural collaborative filtering
    Xiangnan He, Xiaoyu Du, Xiang Wang, Feng Tian, Jinhui Tang, Tat-Seng Chua
    image image image image
  229. Outrageously large neural networks: the sparsely-gated mixture-of-experts layer
    Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, Jeff Dean
    image image image
  230. Overcoming oscillations in quantization-aware training
    Markus Nagel, Marios Fournarakis, Yelysei Bondarenko, Tijmen Blankevoort
    image image image
  231. PAL: Program-aided language models
    Luyu Gao, Aman Madaan, Shuyan Zhou, Uri Alon, Pengfei Liu, Yiming Yang, Jamie Callan, Graham Neubig
    image image image
  232. PaLM: scaling language modeling with pathways
    Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Ben Hutchinson, Reiner Pope, James Bradbury, Jacob Austin, Michael Isard, Guy Gur-Ari, Pengcheng Yin, Toju Duke, Anselm Levskaya, Sanjay Ghemawat, Sunipa Dev, Henryk Michalewski, Xavier Garcia, Vedant Misra, Kevin Robinson, Liam Fedus, Denny Zhou, Daphne Ippolito, David Luan, Hyeontaek Lim, Barret Zoph, Alexander Spiridonov, Ryan Sepassi, David Dohan, Shivani Agrawal, Mark Omernick, Andrew M. Dai, Thanumalayan Sankaranarayana Pillai, Marie Pellat, Aitor Lewkowycz, Erica Moreira, Rewon Child, Oleksandr Polozov, Katherine Lee, Zongwei Zhou, Xuezhi Wang, Brennan Saeta, Mark Diaz, Orhan Firat, Michele Catasta, Jason Wei, Kathy Meier-Hellstern, Douglas Eck, Jeff Dean, Slav Petrov, Noah Fiedel
    image image image image image
  233. Parallel context windows improve in-context learning of large language models
    Nir Ratner, Yoav Levine, Yonatan Belinkov, Ori Ram, Inbal Magar, Omri Abend, Ehud Karpas, Amnon Shashua, Kevin Leyton-Brown, Yoav Shoham
    image image image image
  234. Pattern classification
    Richard O. Duda, Peter E. Hart, David G. Stork
    image
  235. Pattern recognition and machine learning
    Christopher M. Bishop
    image
  236. Perceptual losses for real-time style transfer and super-resolution
    Justin Johnson, Alexandre Alahi, Li Fei-Fei
    image image image
  237. Personalizing dialogue agents: I have a dog, do you have pets too?
    Saizheng Zhang, Emily Dinan, Jack Urbanek, Arthur Szlam, Douwe Kiela, Jason Weston
    image image image
  238. Phase-functioned neural networks for character control
    Daniel Holden, Taku Komura, Jun Saito
    image image image
  239. Playing Atari with deep reinforcement learning
    Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller
    image image
  240. Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing
    Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, Graham Neubig
    image image image image image
  241. Prefix-tuning: optimizing continuous prompts for generation
    Xiang Lisa Li, Percy Liang
    image image image image
  242. Probabilistic latent semantic analysis
    Thomas Hofmann
    image image image
  243. Progressive growing of GANs from improved quality, stability and variation
    Tero Karras, Timo Aila, Samuli Laine, Jaakko Lehtinen
    image image image image
  244. Prompting with pseudo-code instructions
    Mayank Mishra, Prince Kumar, Riyaz Bhat, Rudra Murthy V, Danish Contractor, Srikanth Tamilselvam
    image image image
  245. PullNet: open domain question answering with iterative retrieval on knowledge bases and text
    Haitian Sun, Tania Bedrax-Weiss, William Cohen
    image image image
  246. PyTorch trace analysis for the masses
    Anupam Bhatnagar, Xizhou Feng, Brian Coutinho, Yifan Liu, Sung-Han Lin, Louis Feng, and Yuzhen Huang
    image image image
  247. Q-BERT: Hessian based ultra low precision quantization of BERT
    Sheng Shen, Zhen Dong, Jiayu Ye, Linjian Ma, Zhewei Yao, Amir Gholami, Michael W. Mahoney, Kurt Keutzer
    image image image
  248. R3Net: recurrent residual refinement network for saliency detection
    Zijun Deng, Xiaowei Hu, Lei Zhu, Xuemiao Xu, Jing Qin, Guoqiang Han, Pheng-Ann Heng
    image image image
  249. Reading Wikipedia to answer open-domain questions
    Danqi Chen, Adam Fisch, Jason Weston, Antoine Bordes
    image image image
  250. REALM: Retrieval-augmented language model pretraining
    Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, Ming-Wei Chang
    image image image image
  251. Recurrent models of visual attention
    Volodymyr Mnih, Nicolas Heess, Alex Graves, Koray Kavukcuoglu
    image
  252. Reducing activation recomputation in large transformer models
    Vijay Korthikanti, Jared Casper, Sangkug Lym, Lawrence McAfee, Michael Andersch, Mohammad Shoeybi, Bryan Catanzaro
    image image image image image
  253. Regularizing and optimizing LSTM language models
    Stephen Merity, Nitish Shirish Keskar, Richard Socher
    image image image image
  254. Reinforcement Learning: An Introduction
    Richard S. Sutton, Andrew G. Barto
    image
  255. ReLoRA: high-rank training through low-rank updates
    Vladislav Lialin, Namrata Shivagunde, Sherin Muckatira, Anna Rumshisky
    image image image
  256. Restricted Boltzmann machines for collaborative filtering
    Ruslan Salakhutdinov, Andriy Mnih, Geoffrey Hinton
    image image image image image image
  257. Retrieval augmentation reduces hallucination in conversation
    Kurt Shuster, Spencer Poff, Moya Chen, Douwe Kiela, Jason Weston
    image image image image image
  258. Retrieval-augmented generation for knowledge-intensive NLP tasks
    Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela
    image image image image image
  259. Revisiting classifier two-sample tests
    David Lopez-Paz, Maxime Oquab
    image image image
  260. RoBERTa: a robustly optimized BERT pretraining approach
    Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov
    image image image
  261. RoFormer: enhanced transformer with rotary position embedding
    Jianlin Su, Yu Lu, Shengfeng Pan, Ahmed Murtadha, Bo Wen, Yunfeng Liu
    image image image image
  262. SantaCoder: don't reach for the stars!
    Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni, Bernardo García del Río, Qian Liu, Shamik Bose, Urvashi Bhattacharyya, Terry Yue Zhuo, Ian Yu, Paulo Villegas, Marco Zocca, Sourab Mangrulkar, David Lansky, Huu Nguyen, Danish Contractor, Luis Villa, Jia Li, Dzmitry Bahdanau, Yacine Jernite, Sean Hughes, Daniel Fried, Arjun Guha, Harm de Vries, Leandro von Werra
    image image image image image
  263. Scaling instruction-finetuned language models
    Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Alex Castro-Ros, Marie Pellat, Kevin Robinson, Dasha Valter, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, Jason Wei
    image image image
  264. Scaling PyTorch FSDP for training foundation Models on IBM cloud
    Linsong Chu, Less Wright, Hamid Shojanazeri, Sophia Wen, Raghu Ganti, Geeta Chauhan
    image image image image
  265. Scaling transformer to 1M tokens and beyond with RMT
    Aydar Bulatov, Yuri Kuratov, Mikhail S. Burtsev
    image image image
  266. Self-instruct: aligning language model with self generated instructions
    Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A. Smith, Daniel Khashabi, Hannaneh Hajishirzi
    image image image
  267. Self-normalizing neural networks
    Günter Klambauer, Thomas Unterthiner, Andreas Mayr, Sepp Hochreiter
    image image image image
  268. Semantically equivalent adversarial rules for debugging NLP models
    Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin
    image image
  269. Seq2seq model and the exposure bias problem
    Aditya Mohanty
    image image image
  270. Sequence parallelism: long sequence training from system perspective
    Shenggui Li, Fuzhao Xue, Chaitanya Baranwal, Yongbin Li, Yang You
    image image image image image
  271. Sequential latent knowledge selection for knowledge-grounded dialogue
    Byeongchang Kim, Jaewoo Ahn, Gunhee Kim
    image image image image
  272. Simple and effective multi-paragraph reading comprehension
    Christopher Clark, Matt Gardner
    image image image
  273. Simplifying transformer blocks
    Bobby He, Thomas Hofmann
    image image image
  274. SmoothQuant: accurate and efficient post-training quantization for large language models
    Guangxuan Xiao, Ji Lin, Mickael Seznec, Julien Demouth, Song Han
    image image image image image
  275. Soft filter pruning for accelerating deep convolutional neural networks
    Yang He, Guoliang Kang, Xuanyi Dong, Yanwei Fu, Yi Yang
    image image image
  276. SOLAR 10.7B: scaling large language models with simple yet effective depth up-scaling
    Dahyun Kim, Chanjun Park, Sanghoon Kim, Wonsung Lee, Wonho Song, Yunsu Kim, Hyeonwoo Kim, Yungi Kim, Hyeonju Lee, Jihoo Kim, Changbae Ahn, Seonghoon Yang, Sukyung Lee, Hyunbyung Park, Gyoungjin Gim, Mikyoung Cha, Hwalsuk Lee, Sunghun Kim
    image image image image
  277. SOLOIST: building task bots at scale with transfer learning and machine teaching
    Baolin Peng, Chunyuan Li, Jinchao Li, Shahin Shayandeh, Lars Liden, Jianfeng Gao
    image image image image image
  278. Solving quantitative reasoning problems with language models
    Aitor Lewkowycz, Anders Andreassen, David Dohan, Ethan Dyer, Henryk Michalewski, Vinay Ramasesh, Ambrose Slone, Cem Anil, Imanol Schlag, Theo Gutman-Solo, Yuhuai Wu, Behnam Neyshabur, Guy Gur-Ari, Vedant Misra
    image image image image
  279. Spatial temporal graph convolutional networks for skeleton-based action recognition
    Sijie Yan, Yuanjun Xiong, Dahua Lin
    image image image image
  280. Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting
    Bing Yu, Haoteng Yin, Zhanxing Zhu
    image image image image image
  281. Spectral normalization for generative adversarial networks
    Takeru Miyato, Toshiki Kataoka, Masanori Koyama, Yuichi Yoshida
    image image image
  282. Speech and language processing
    Daniel Jurafsky, James H. Martin
    image
  283. StarCoder: may the source be with you!
    Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, Qian Liu, Evgenii Zheltonozhskii, Terry Yue Zhuo, Thomas Wang, Olivier Dehaene, Mishig Davaadorj, Joel Lamy-Poirier, João Monteiro, Oleh Shliazhko, Nicolas Gontier, Nicholas Meade, Armel Zebaze, Ming-Ho Yee, Logesh Kumar Umapathi, Jian Zhu, Benjamin Lipkin, Muhtasham Oblokulov, Zhiruo Wang, Rudra Murthy, Jason Stillerman, Siva Sankalp Patel, Dmitry Abulkhanov, Marco Zocca, Manan Dey, Zhihan Zhang, Nour Fahmy, Urvashi Bhattacharyya, Wenhao Yu, Swayam Singh, Sasha Luccioni, Paulo Villegas, Maxim Kunakov, Fedor Zhdanov, Manuel Romero, Tony Lee, Nadav Timor, Jennifer Ding, Claire Schlesinger, Hailey Schoelkopf, Jan Ebert, Tri Dao, Mayank Mishra, Alex Gu, Jennifer Robinson, Carolyn Jane Anderson, Brendan Dolan-Gavitt, Danish Contractor, Siva Reddy, Daniel Fried, Dzmitry Bahdanau, Yacine Jernite, Carlos Muñoz Ferrandis, Sean Hughes, Thomas Wolf, Arjun Guha, Leandro von Werra, Harm de Vries
    image image image image image
  284. Sticking the landing: simple, lower-variance gradient estimators for variational inference
    Geoffrey Roeder, Yuhuai Wu, David K. Duvenaud
    image image image image
  285. StitchNet: composing neural networks from pre-trained fragments
    Surat Teerapittayanon, Marcus Comiter, Brad McDanel, H.T. Kung
    image image image
  286. Stochastic hyperparameter optimization through hypernetworks
    Jonathan Lorraine, David Duvenaud
    image image image
  287. Strategies for teaching layered networks classification tasks
    Ben S. Wittner, John S. Denker
    image image
  288. Structured prompting: scaling in-context learning to 1,000 examples
    Yaru Hao, Yutao Sun, Li Dong, Zhixiong Han, Yuxian Gu, Furu Wei
    image image image
  289. Style transfer from non-parallel text by cross-alignment
    Tianxiao Shen, Tao Lei, Regina Barzilay, Tommi Jaakkola
    image image image
  290. Subword regularization: improving neural network translation models with multiple subword candidates
    Taku Kudo
    image image image image
  291. Supervised learning of probability distributions by neural networks
    Eric B. Baum, Frank Wilczek
    image image
  292. Supporting efficient large model training on AMD InstinctTM GPUs with DeepSpeed
    Olatunji Ruwase, Jeff Rasley
    image image image image
  293. Switch transformers: scaling to trillion parameter models with simple and efficient sparsity
    William Fedus, Barret Zoph, Noam Shazeer
    image image image
  294. Synchronization in neural nets
    Jacques J. Vidal, John Haggerty
    image image
  295. Tackling the poor assumptions of Naive Bayes text classifiers
    Jason D. M. Rennie, Lawrence Shih, Jaime Teevan, David R. Karger
    image image
  296. The best of both worlds: combining recent advances in neural machine translation
    Mia Xu Chen, Orhan Firat, Ankur Bapna, Melvin Johnson, Wolfgang Macherey, George Foster, Llion Jones, Mike Schuster, Noam Shazeer, Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Lukasz Kaiser, Zhifeng Chen, Yonghui Wu, Macduff Hughes
    image image image
  297. The elements of statistical learning: data mining, inference and prediction
    Trevor Hastie, Robert Tibshirani, Jerome Friedman
    image
  298. The Flan collection: designing data and methods for effective instruction tuning
    Shayne Longpre, Le Hou, Tu Vu, Albert Webson, Hyung Won Chung, Yi Tay, Denny Zhou, Quoc V. Le, Barret Zoph, Jason Wei, Adam Roberts
    image image image
  299. The information bottleneck method
    Naftali Tishby, Fernando C. Pereira, William Bialek
    image image
  300. The Pile: an 800GB dataset of diverse text for language modeling
    Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, Horace He, Anish Thite, Noa Nabeshima, Shawn Presser, Connor Leahy
    image image image
  301. The power of scale for parameter-efficient prompt tuning
    Brian Lester, Rami Al-Rfou, Noah Constant
    image image image
  302. The wisdom of hindsight makes language models better instruction followers
    Tianjun Zhang, Fangchen Liu, Justin Wong, Pieter Abbeel, Joseph E. Gonzalez
    image image image
  303. Thermometer encoding: one hot way to resist adversarial examples
    Jacob Buckman, Aurko Roy, Colin Raffel, Ian Goodfellow
    image image image image
  304. To regularize or not to regularize? The bias variance trade-off in regularized AEs
    Arnab Kumar Mondal, Himanshu Asnani, Parag Singla, Prathosh AP
    image image image
  305. Towards crowdsourced training of large neural networks using decentralized mixture-of-experts
    Max Ryabinin, Anton Gusev
    image image image image
  306. Towards deep learning models resilient to adversarial attacks
    Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, Adrian Vladu
    image image image image
  307. Towards evaluating the robustness of neural networks
    Nicholas Carlini, David Wagner
    image image image image
  308. Train short, test long: Attention with linear biases enables input length extrapolation
    Ofir Press, Noah Smith, Mike Lewis
    image image image image
  309. Training compute-optimal large language models
    Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack W. Rae, Oriol Vinyals, Laurent Sifre
    image image image image
  310. Training language models to follow instructions with human feedback
    Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe
    image image image image
  311. Transformer memory as a differentiable search index
    Yi Tay, Vinh Q. Tran, Mostafa Dehghani, Jianmo Ni, Dara Bahri, Harsh Mehta, Zhen Qin, Kai Hui, Zhe Zhao, Jai Gupta, Tal Schuster, William W. Cohen, Donald Metzler
    image image image image
  312. Transformer quality in linear time
    Weizhe Hua, Zihang Dai, Hanxiao Liu, Quoc Le
    image image image
  313. Transformers explained visually (part 1): overview of functionality
    Ketan Doshi
    image image image
  314. Transformers explained visually (part 2): how it works, step-by-step
    Ketan Doshi
    image image image
  315. Transformers explained visually (part 3): multi-head attention, deep dive
    Ketan Doshi
    image image image
  316. Turing-NLG: a 17-billion-parameter language model by Microsoft
    Corby Rosset
    image image image image
  317. UL2: unifying language learning paradigms
    Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Jason Wei, Xuezhi Wang, Hyung Won Chung, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Denny Zhou, Neil Houlsby, Donald Metzler
    image image image image
  318. Understanding convolutional neural networks with a mathematical model
    C.-C. Jay Kuo
    image image
  319. Understanding disentangling in β-VAE
    Christopher P. Burgess, Irina Higgins, Arka Pal, Loic Matthey, Nick Watters, Guillaume Desjardins, Alexander Lerchner
    image image image image
  320. Understanding the Open Pre-Trained Transformers (OPT) library
    Cameron Wolfe
    image image image
  321. Unit tests for stochastic optimization
    Tom Schaul, Ioannis Antonoglou, David Silver
    image image image
  322. Universal language model fine-tuning for text classification
    Jeremy Howard, Sebastian Ruder
    image image image
  323. Unlimiformer: long-range transformers with unlimited length input
    Amanda Bertsch, Uri Alon, Graham Neubig, Matthew R. Gormley
    image image image image
  324. Unpaired image-to-image translation using cycle-consistent adversarial networks
    Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros
    image image image image
  325. Unsupervised machine translation using monolingual corpora only
    Guillaume Lample, Alexis Conneau, Ludovic Denoyer, Marc'Aurelio Ranzato
    image image image image
  326. Unsupervised representation learning by predicting image rotations
    Spyros Gidaris, Praveer Singh, Nikos Komodakis
    image image image
  327. Using DeepSpeed and Megatron to train Megatron-Turing NLG 530B, the world’s largest and most powerful generative language model
    Ali Alvi, Paresh Kharya
    image image image image image
  328. Variational inference using implicit distributions
    Ferenc Huszár
    image image image
  329. Variational inference with latent space quantization for adversarial resilience
    Vinay Kyatham, Mayank Mishra, Tarun Kumar Yadav, Deepak Mishra, Prathosh AP
    image image image image image
  330. Variational learning for unsupervised knowledge grounded dialogs
    Mayank Mishra, Dhiraj Madan, Gaurav Pandey, Danish Contractor
    image image image image
  331. Variational lossy autoencoder
    Xi Chen, Diederik P. Kingma, Tim Salimans, Yan Duan, Prafulla Dhariwal, John Schulman, Ilya Sutskever, Pieter Abbeel
    image image image
  332. Vector-quantized input-contextualized soft prompts for natural language understanding
    Rishabh Bhardwaj, Amrita Saha, Steven C.H. Hoi, Soujanya Poria
    image image image
  333. VEEGAN: reducing mode collapse in GANs using implicit variational learning
    Akash Srivastava, Lazar Valkov, Chris Russell, Michael U. Gutmann, Charles Sutton
    image image image image
  334. Very deep convolutional networks for large-scale image recognition
    Karen Simonyan, Andrew Zisserman
    image image image
  335. Visualizing data using t-SNE
    Laurens van der Maaten, Geoffrey Hinton
    image image image
  336. Wasserstein GAN
    Martin Arjovsky, Soumith Chintala, Léon Bottou
    image image image image
  337. wav2vec 2.0: a framework for self-supervised learning of speech representations
    Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli
    image image image image
  338. Wavenet: a generative model for raw audio
    Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, Koray Kavukcuoglu
    image image image image
  339. WebGPT: browser-assisted question-answering with human feedback
    Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, John Schulman
    image image image image
  340. What language model to train if you have one million GPU hours?
    Teven Le Scao, Thomas Wang, Daniel Hesslow, Lucile Saulnier, Stas Bekman, M Saiful Bari, Stella Biderman, Hady Elsahar, Jason Phang, Ofir Press, Colin Raffel, Victor Sanh, Sheng Shen, Lintang Sutawika, Jaesung Tae, Zheng Xin Yong, Julien Launay, Iz Beltagy
    image image image image image
  341. Word translation without parallel data
    Guillaume Lample, Alexis Conneau, Marc'Aurelio Ranzato, Ludovic Denoyer, Hervé Jégou
    image image image
  342. Yandex publishes YaLM 100B. It’s the largest GPT-like neural network in open source
    Mikhail Khrushchev
    image image image image
  343. You only look once: unified, real-time object detection
    Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi
    image image image
  344. ZeRO & DeepSpeed: new system optimizations enable training models with over 100 billion parameters
    DeepSpeed Team, Rangan Majumder, Junhua Wang
    image image image image
  345. ZeRO++: Extremely efficient collective communication for giant model training
    Guanhua Wang, Heyang Qin, Sam Ade Jacobs, Connor Holmes, Samyam Rajbhandari, Olatunji Ruwase, Feng Yan, Lei Yang, Yuxiong He
    image image image image image
  346. ZeRO-2 & DeepSpeed: shattering barriers of deep learning speed & scale
    DeepSpeed Team, Rangan Majumder, Junhua Wang
    image image image image
  347. ZeRO-Infinity: breaking the GPU memory wall for extreme scale deep learning
    Samyam Rajbhandari, Olatunji Ruwase, Jeff Rasley, Shaden Smith, Yuxiong He
    image image image image image
  348. Zero-shot text-to-image generation
    Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, Ilya Sutskever
    image image image image image
  349. ZeRO: memory optimizations toward training trillion parameter models
    Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, Yuxiong He
    image image image image image
  350. ZeroQuant: efficient and affordable post-training quantization for large-scale transformers
    Zhewei Yao, Reza Yazdani Aminabadi, Minjia Zhang, Xiaoxia Wu, Conglong Li, Yuxiong He
    image image image
  351. β-VAE: learning basic visual concepts with a constrained variational framework
    Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, Alexander Lerchner
    image image image

Calculus

  1. Calculus of variations
    I. M. Gelfand, S. V. Fomin
    image
  2. Thomas' calculus
    George B. Thomas Jr., Maurice D. Weir
    image

Computer Architecture

  1. Accelerated computing with a reconfigurable dataflow architecture
    image image image
  2. Computer architecture: a quantitative approach
    John L. Hennessy, David A. Patterson
    image
  3. Computer organization and design ARM edition: the hardware software interface
    David A. Patterson, John L. Hennessy
    image
  4. Flipping bits in memory without accessing them: an experimental study of DRAM disturbance errors
    Yoongu Kim, Ross Daly, Jeremie Kim, Chris Fallin, Ji Hye Lee, Donghyuk Lee, Chris Wilkerson, Konrad Lai, Onur Mutlu
    image image image image
  5. Improving DRAM performance by parallelizing refreshes with accesses
    Kevin Kai-Wei Chang, Donghyuk Lee, Zeshan Chishti, Alaa R. Alameldeen, Chris Wilkerson, Yoongu Kim, Onur Mutlu
    image image image image image
  6. Memory performance attacks: denial of memory service in multi-core systems
    Thomas Moscibroda, Onur Mutlu
    image image image image
  7. Memory scaling: a systems architecture perspective
    Onur Mutlu
    image image image
  8. Millicode in an IBM zSeries processor
    L. C. Heller, M. S. Farrell
    image image
  9. MTIA v1: Meta's first-generation AI inference accelerator
    Amin Firoozshahian, Olivia Wu, Joel Coburn, Roman Levenstein
    image image image
  10. RAIDR: Retention-Aware Intelligent DRAM Refresh
    Jamie Liu, Ben Jaiyen, Richard Veras, Onur Mutlu
    image image image
  11. Stall-time fair memory access scheduling for chip multiprocessors
    Onur Mutlu, Thomas Moscibroda
    image image image

Computer Graphics

  1. Principles of traditional animation applied to 3D computer animation
    John Lasseter
    image image image

Data Structures and Algorithms

  1. Data structures and algorithms in Java
    Michael T. Goodrich, Roberto Tamassia, Michael H. Goldwasser
    image
  2. Introduction to algorithms
    Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein
    image

Digital Electronics

  1. Digital design: with an introduction to the Verilog HDL
    M. Morris Mano, Michael D. Ciletti
    image

Graph Theory

  1. Introduction to graph theory
    Robin Wilson
    image

Information Theory

  1. Elements of information theory
    Thomas M. Cover, Joy A. Thomas
    image
  2. Error detecting and error correcting codes
    R. W. Hamming
    image image image image

Linear Algebra

  1. Linear algebra and its applications
    Gilbert Strang
    image
  2. Matrix analysis and applied linear algebra
    Carl D. Meyer
    image
  3. The matrix cookbook
    Kaare Brandt Petersen, Michael Syskind Pedersen
    image

Measure Theory

  1. Measure theory
    Donald L. Cohn
    image

Optimization Theory

  1. Convex Optimization
    Stephen Boyd, Lieven Vandenberghe
    image
  2. Distributed optimization and statistical learning via the alternating direction method of multipliers
    Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, Jonathan Eckstein
    image

Probability and Stochastic Processes

  1. Introduction to probability and stochastic processes with applications
    Liliana Blanco Castaneda, Viswanathan Arunachalam, Delvamuthu Dharmaraja
    image

Quantum Computing

  1. A fast quantum mechanical algorithm for database search
    Lov K. Grover
    image image image
  2. A single quantum cannot be cloned
    W. K. Wootters, W. H. Zurek
    image image
  3. Can quantum-mechanical description of physical reality be considered complete
    Albert Einstein, Boris Podolsky, Nathan Rosen
    image image
  4. Image recognition with an adiabatic quantum computer I. mapping to quadratic unconstrained binary optimization
    Hartmut Neven, Geordie Rose, William G. Macready
    image image image image
  5. Integer optimization toolbox (minimizing polynomials over integer lattices using quantum annealing)
    Pooya Ronagh
    image
  6. Limits on parallel speedup for classical Ising model solvers
    image
  7. Partitioning optimization problems for hybrid classical/quantum execution
    Michael Booth, Steven P. Reinhardt, Aidan Roy
    image
  8. Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer
    Peter W. Shor
    image image image
  9. Probabilistic cloning and identification of linearly independent quantum states
    Lu-Ming Duan, Guang-Can Guo
    image image image
  10. Programming with D-Wave: map coloring problem
    E. D. Dahl
    image
  11. Quantum computation and quantum information
    Michael A. Nielsen, Isaac L. Chuang
    image
  12. Quantum computing: a gentle introduction
    Eleanor Rieffel, Wolfgang Polak
    image
  13. Quantum performance evaluation: a short reading list
    image
  14. Quantum theory, the Church-Turing principle and the universal quantum computer
    David Deutsch
    image image image
  15. Rapid solution of problems by quantum computation
    David Deutsche, Richard Jozsa
    image image image
  16. Teleporting an unknown quantum state via dual classical and Einstein-Podolsky-Rosen channels
    Charles H. Bennett, Gilles Brassard, Claude Crépeau, Richard Jozsa, Asher Peres, William K. Wootters
    image image image

Signal Processing

  1. Discrete-time signal processing
    Alan V. Oppenheim, Ronald W. Schafer
    image
  2. Foundations of Signal Processing
    Martin Vetterli, Jelena Kovačević, Vivek K Goyal
    image
  3. Signals and systems
    Alan V. Oppenheim
    image
  4. Understanding digital signal processing
    Richard G. Lyons
    image