/Papers-books-and-blogs

This repository contains the research papers, white papers, thesis etc that I love.

Primary LanguagePython

This repository contains a list of the books, blogs, research papers and white papers that I have read and found interesting.

Table of contents

AI, DL, NLP and RL

  1. 1-bit Adam: communication efficient large-scale training with Adam’s convergence speed
    image image image
  2. 5 best practices for efficient model training
    image image image image
  3. 8-bit approximations for parallelism in deep learning
    image image image image image
  4. 8-bit optimizers via block-wise quantization
    image image image
  5. A 'neural' network that learns to play Backgammon
    image image
  6. A BetterTransformer for fast transformer inference
    image image image image
  7. A deep reinforced model for abstractive summarization
    image image image image
  8. A dynamical approach to temporal pattern processing
    image image
  9. A few more examples may be worth billions of parameters
    image image image
  10. A general and adaptive robust loss function
    image image
  11. A generalist agent
    image image
  12. A gentle introduction to 8-bit matrix multiplication for transformers at scale using Hugging Face transformers, accelerate and bitsandbytes
    image image image image
  13. A note on the evaluation of generative models
    image image image
  14. A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings
    image image image
  15. A simple but tough-to-beat baseline for sentence embeddings
    image image image
  16. A simple language model for task-oriented dialogue
    image image image
  17. A simple neural attentive meta-learner
    image image image image
  18. A simple neural network module for relational reasoning
    image image image
  19. A study of BFLOAT16 for deep learning training
    image image image
  20. A style-based generator architecture for generative adversarial networks
    image image image image
  21. A stylometric inquiry into hyperpartisan and fake news
    image image image
  22. A3T: adversarially augmented adversarial training
    image image image image
  23. Accelerated PyTorch 2 transformers
    image image image image
  24. Accelerating large language model training with variable sparse pre-training and dense fine-tuning
    image image image
  25. Accelerating PyTorch with CUDA graphs
    image image image image
  26. AdapterHub: a framework for adapting transformers
    image image image image
  27. Adversarial approximate inference for speech to electroglottograph conversion
    image image image image image image
  28. Adversarial autoencoders
    image image image image
  29. Adversarial examples that fool both computer vision and time-limited humans
    image image image
  30. Adversarial feature learning
    image image image image
  31. Adversarial generation of natural language
    image image image image
  32. Adversarial information factorization
    image image image image
  33. Adversarially learned inference
    image image image image
  34. AlexaTM 20B: few-shot learning using a large-scale multilingual seq2seq model
    image image image image
  35. Amazon SageMaker model parallelism: a general and flexible framework for large model training
    image image image image image
  36. An image is worth 16x16 words: transformers for image recognition at scale
    image image image image
  37. An overview of gradient descent optimization algorithms
    image image image
  38. Analysing mathematical reasoning abilities of neural models
    image image
  39. Approximation by superpositions of sigmoidal function
    image image
  40. Artificial Intelligence: a modern approach
    image
  41. Aspect based sentiment analysis with gated convolutional networks
    image image image
  42. Attention is all you need
    image image image image
  43. Attention is off by one
    image image image
  44. Auto-encoding variational Bayes
    image image image
  45. Backpropagation through the void: optimizing control variates for black-box gradient estimation
    image image image image image image image image
  46. BART: denoising sequence-to-sequence pre-training for natural language generation, translation and comprehension
    image image image
  47. Batch normalization: accelerating deep network training by reducing internal covariate shift
    image image image image
  48. Behavioral cloning from observation
    image image image image image
  49. BERT: pre-training of deep bidirectional transformers for language understanding
    image image image
  50. Beyond domain APIs: Task-oriented conversational modeling with unstructured knowledge access
    image image image
  51. BLIP: bootstrapping language-image pre-training for unified vision-language understanding and generation
    image image image
  52. Blockwise parallel transformer for large context models
    image image image image
  53. BLOOM: A 176B-parameter open-access multilingual language model
    image image image image image
  54. Bootstrapping entity alignment with knowledge graph embedding
    image image image image
  55. Bridging the gap between prior and posterior knowledge selection for knowledge-grounded dialogue generation
    image image image image
  56. Bringing open large language models to consumer devices
    image image image image
  57. BTLM-3B-8K: 7B performance in a 3 billion parameter model
    image image image image
  58. Building blocks for a complex-valued transformer architecture
    image image image image
  59. CATS: contextually-aware thresholding for sparsity in large language models
    image image image image
  60. ChatGPT: optimizing language models for dialogue
    image image image
  61. ColBERT: efficient and effective passage search via contextualized late interaction over BERT
    image image image
  62. Colossal-AI: a unified deep learning system for large-scale parallel training
    image image image image image
  63. Compiling machine learning programs via high-level tracing
    image image image
  64. Complex transformer: a framework for modeling complex-valued sequence
    image image image image
  65. Conceptual captions: a cleaned, hypernymed, image alt-text dataset for automatic image captioning
    image image image image
  66. Conditional image synthesis with auxilliary classifier GANs
    image image image image
  67. Conformal nucleus sampling
    image image image
  68. Connecting large language models with evolutionary algorithms yields powerful prompt optimizers
    image image image image
  69. Connectivity versus entropy
    image image
  70. Constituency parsing with a self-attentive encoder
    image image image
  71. Constraint based knowledge base distillation in end-to-end task oriented dialogs
    image image image
  72. Context generation improves open domain question answering
    image image image image
  73. Convert transformers to ONNX with hugging face optimum
    image image image image
  74. Convolutional networks for graphs for learning molecular fingerprints
    image image image
  75. Convolutional neural network language models
    image image
  76. Countering adversarial images using input transformations
    image image image image
  77. Cramming: training a language model on a single GPU in one day
    image image image
  78. Crosslingual generalization through multitask finetuning
    image image image image image image
  79. Curriculum learning
    image image image
  80. Cutting down on prompts and parameters: simple few-shot learning with language models
    image image image image
  81. Data engineering for scaling language models to 128K context
    image image image image
  82. Deep Boltzmann machines
    image image image image
  83. Deep complex networks
    image image image
  84. Deep learning
    image
  85. Deep learning and the information bottleneck principle
    image image image
  86. Deep learning techniques for super-resolution in video games
    image image image image image
  87. Deep residual learning for image recognition
    image image image
  88. Deep text classification can be fooled
    image image image
  89. DeepSpeed compression: a composable library for extreme compression and zero-cost quantization
    image image image image
  90. DeepSpeed Inference: enabling efficient inference of transformer models at unprecedented scale
    image image image image
  91. DeepSpeed powers 8x larger MoE model training with high performance
    image image image image image
  92. DeepSpeed Ulysses: system optimizations for enabling training of extreme long sequence transformer models
    image image image image
  93. DeepSpeed: accelerating large-scale model inference and training via system optimizations and compression
    image image image image
  94. DeepSpeed: advancing MoE inference and training to power next-generation AI scale
    image image image image image
  95. Denoising distantly supervised open-domain question answering
    image image image
  96. Diffusion convolutional recurrent neural network: data-driven traffic forecasting
    image image image image image
  97. Discrete variational autoencoders
    image image image
  98. Disentangling by factorising
    image image image image
  99. Disentangling language and knowledge in task-oriented dialogs
    image image image image
  100. Distributionally robust language modeling
    image image image image
  101. Editing models with task arithmetic
    image image image image
  102. Efficient estimation of word representations in vector space
    image image image
  103. Efficient large scale language modeling with mixtures of experts
    image image image image image
  104. Efficient large-scale language model training on GPU clusters using Megatron-LM
    image image image image image
  105. Enchancing the reliability of out-of-distribution image detection in neural networks
    image image image
  106. End-to-end task-oriented dialog modeling with semi-structured knowledge management
    image image image
  107. Enhance reasoning for large language models in the game Werewolf
    image image image image
  108. Ensemble adversarial training: attacks and defenses
    image image image image
  109. Equilibrium propagation: bridging the gap between energy-based models and backpropagation
    image image image image
  110. Estimating or propagating gradients through stochastic neurons for conditional computation
    image image image image
  111. Exemplar encoder-decoder for neural conversation generation
    image image image
  112. Expert human-level driving in gran turismo sport using deep reinforcement learning with image-based representation
    image image
  113. Exploring deep recurrent models with reinforcement learning for molecule design
    image image image
  114. Exploring the limits of transfer learning with a unified text-to-text transformer
    image image image
  115. Extreme compression for pre-trained transformers made simple and efficient
    image image image image image image image
  116. Fast abstractive summarization with reinforce-selected sentence rewriting
    image image image image
  117. Fast benchmarking of accuracy vs. training time with cyclic learning rates
    image image image
  118. Fast transformer decoding: one write-head is all you need
    image image image image
  119. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning
    image image image
  120. FFJORD: Free-form continuous dynamics for scalable reversible generative models
    image image image
  121. Finetuned language models are zero-shot learners
    image image image image
  122. Flash-decoding for long-context inference
    image image image image
  123. FlashAttention: fast and memory-efficient exact attention with IO-awareness
    image image image image
  124. FlashAttention: fast transformer training with long sequences
    image image image image
  125. Foundations of NLP explained visually: beam search, how it works
    image image image
  126. FP8 formats for deep learning
    image image image image
  127. FP8-LM: training FP8 large language models
    image image image image
  128. Gemini: a family of highly capable multimodal models
    image image image
  129. Gemma: open models based on Gemini research and technology
    image image image image
  130. Generating adversarial examples with adversarial networks
    image image image image image
  131. Generating sentences from a continuous space
    image image
  132. Generation-augmented retrieval for open-domain question answering
    image image image
  133. Generative adversarial nets
    image image image image
  134. Generative pretraining from pixels
    image image image
  135. Genetic algorithms in search, optimization and machine learning
    image
  136. GeoMAN: multi-level attention networks for geo-sensory time series prediction
    image image image image
  137. Getting the most out of the NVIDIA A100 GPU with Multi-Instance GPU
    image image image
  138. GLaM: efficient scaling of language models with mixture-of-experts
    image image image image image
  139. GLM-130B: an open bilingual pre-trained model
    image image image image
  140. GLU variants improve transformer
    image image image
  141. Going deeper with convolutions
    image image image
  142. GPT-4 architecture, infrastructure, training dataset, costs, vision, MoE
    image image image image image image image
  143. GPT-NeoX-20B: an open-source autoregressive language model
    image image image image
  144. GQA: training generalized multi-query transformer models from multi-head checkpoints
    image image image image
  145. Gradient-based hyperparameter optimization through reversible learning
    image image image
  146. Graph attention networks
    image image image
  147. Grounding large language models in interactive environments with online reinforcement learning
    image image image
  148. Hierarchical neural story generation
    image image image
  149. Hindsight: posterior-guided training of retrievers for improved open-ended generation
    image image image image
  150. HotFlip: white-box adversarial examples for text classification
    image image image
  151. How big should my language model be?
    image image image image
  152. How Pytorch 2.0 accelerates deep learning with operator fusion and CPU/GPU code-generation
    image image image
  153. How should AI systems behave, and who should decide?
    image image image
  154. How we sped up transformer inference 100x for 🤗 API customers
    image image image
  155. How 🤗 Accelerate runs very large models thanks to PyTorch
    image image image image image
  156. Hydragen: high-throughput LLM inference with shared prefixes
    image image image
  157. HyKnow: end-to-end task-oriented dialog modeling with hybrid knowledge management
    image image image
  158. Hyperparameter search with Transformers and Ray Tune
    image image image image
  159. Image-to-image translation with conditional generative adversarial networks
    image
  160. ImageNet classification using deep convolutional neural networks
    image image image
  161. Improving entity linking by modeling latent relations between mentions
    image image image
  162. Improving language models by retrieving from trillions of tokens
    image image image image image
  163. Improving language understanding by generative pre-training
    image image image
  164. Improving reinforcement learning from human feedback with efficient reward model ensemble
    image image image image
  165. Incredibly fast BLOOM inference with DeepSpeed and Accelerate
    image image image image
  166. Inference suboptimality in variational autoencoders
    image image image image
  167. InfoGAN: interpretable representation learning by information maximizing generative adversarial nets
    image image image image image
  168. Interpretable convolutional neural networks via feedforward design
    image image
  169. Introducing MPT-7B: a new standard for open-source, commercially usable LLMs
    image image image image
  170. Introducing nvFuser, a deep learning compiler for PyTorch
    image image image image image
  171. Introducing Turing image super resolution: AI powered image enhancements for Microsoft Edge and Bing maps
    image image image
  172. Introducing 🤗 accelerate
    image image image image
  173. Is ChatGPT 175 billion parameters? Technical analysis
    image image image
  174. Is the future of neural networks Sparse? An introduction (1/N)
    image image image
  175. Jack of all trades, master of some, a multi-purpose transformer agent
    image image
  176. Jack of all trades, master of some, a multi-purpose transformer agent
    image image
  177. Joint reasoning on hybrid-knowledge sources for task-oriented dialog
    image image image
  178. Judging LLM-as-a-judge with MT-bench and chatbot arena
    image image image
  179. Know what you don't know: unanswerable questions for SQuAD
    image image image
  180. Knowledge-grounded dialogue generation with pre-trained language models
    image image image
  181. Language is not all you need: aligning perception with language models
    image image image
  182. Language modeling with gated convolutional networks
    image image image
  183. Language modelling with pixels
    image image image
  184. Language models (mostly) know what they know
    image image image
  185. Language models are unsupervised multitask learners
    image image image
  186. Language models as compilers: simulating pseudocode execution improves algorithmic reasoning in language models
    image image image
  187. Large language models are not fair evaluators
    image image image
  188. Layer normalization
    image image image image
  189. Learning activation functions to improve deep neural networks
    image image
  190. Learning associative inference using fast weight memory
    image image image
  191. Learning discourse-level diversity for neural dialog models using conditional variational autoencoders
    image image image image
  192. Learning on a general network
    image image image
  193. Learning representations by back-propagating errors
    image image image
  194. Learning transferable visual models from natural language supervision
    image image image
  195. Learning transferable visual models from natural language supervision
    image image image image
  196. Learning word embeddings efficiently with noise-contrastive estimation
    image image image
  197. Leave no context behind: efficient infinite context transformers with infini-attention
    image image image
  198. Lessons learned on language model safety and misuse
    image image image
  199. Lifelong language pretraining with distribution-specialized experts
    image image image image
  200. Linear scaling made possible with weight streaming
    image image image image image
  201. Linformer: self-attention with linear complexity
    image image image image
  202. LLM in a flash: efficient large language model inference with limited memory
    image image image image
  203. LLM.int8(): 8-bit matrix multiplication for transformers at scale
    image image image
  204. Long sequence modeling with XGen: a 7B LLM trained on 8K input sequence length
    image image image image
  205. LoRA: Low-Rank Adaptation of large language models
    image image image
  206. Lost in the middle: how language models use long contexts
    image image image
  207. M6-10T: a sharing-delinking paradigm for efficient multi-trillion parameter pretraining
    image image image image image image image image
  208. Machine learning
    image
  209. Machine learning: a probabilistic perspective
    image
  210. Making deep learning go brrrr from first principles
    image image image
  211. Making DeepSpeed ZeRO run efficiently on more-affordable hardware
    image image image image
  212. Mask & focus: conversation modelling by learning concepts
    image image image
  213. Matryoshka representation learning
    image image image
  214. Maximizing communication efficiency for large-scale training via 0/1 Adam
    image image image
  215. MCR-DL: mix-and-match communication runtime for deep learning
    image image image image
  216. MegaBlocks: efficient sparse training with mixture-of-experts
    image image image image
  217. Megatron-LM: training multi-billion parameter language models using model parallelism
    image image image image image image
  218. Memory-efficient pipeline-parallel DNN training
    image image image image image image
  219. MinTL: minimalist transfer learning for task-oriented dialogue systems
    image image image
  220. Mix and match: learning-free controllable text generation using energy language models
    image image image
  221. Mixed precision training
    image image image
  222. Mixture of attention heads: selecting attention heads per token
    image image image image image
  223. Mixture-of-Experts meets instruction tuning: a winning combination for large language models
    image image image image
  224. mixup: beyond empirical risk minimization
    image image image image image image
  225. MMCoQA: conversational question answering over text, tables and images
    image image image image image
  226. Mode matching in GANs through latent space learning and inversion
    image image image image
  227. Multi-level memory for task oriented dialogs
    image image image
  228. Multitask prompt tuning enables parameter-efficient transfer learning
    image image image
  229. MultiWOZ - A large-scale multi-domain Wizard-of-Oz dataset for task-oriented dialogue modelling
    image image image image
  230. Mutual information neural estimation
    image image image
  231. NeMo: a toolkit for building AI applications using neural modules
    image image image image
  232. Neural GPUs learn algorithms
    image image image
  233. Neural network methods for natural language processing
    image
  234. Neural networks and physical systems with emergent collective computational abilities
    image image image image
  235. Neural networks for pattern recognition
    image
  236. Neural ordinary differential equations
    image image image
  237. No train no gain: revisiting efficient training algorithms for transformer-based language models
    image image image
  238. Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples
    image image image
  239. OctoPack: instruction tuning code large language models
    image image image image
  240. On the convergence of Adam and beyond
    image image image image
  241. On the power of neural networks for solving hard problems
    image image image
  242. One model to learn them all
    image image image
  243. Open domain question answering over tables via dense retrieval
    image image image
  244. Open question answering over tables and text
    image image image
  245. OPT: open pre-trained transformer language models
    image image image image
  246. Optimal brain compression: a framework for accurate post-training quantization and pruning
    image image image image
  247. Optimal perceptual inference
    image image image
  248. Optimization story: Bloom inference
    image image image image
  249. Orca 2: teaching small language models how to reason
    image image image
  250. Orca: progressive learning from complex explanation traces of GPT-4
    image image image
  251. Outer product-based neural collaborative filtering
    image image image image
  252. Outrageously large neural networks: the sparsely-gated mixture-of-experts layer
    image image image
  253. Overcoming oscillations in quantization-aware training
    image image image
  254. PAL: Program-aided language models
    image image image
  255. PaLM: scaling language modeling with pathways
    image image image image image
  256. Parallel context windows improve in-context learning of large language models
    image image image image
  257. Pattern classification
    image
  258. Pattern recognition and machine learning
    image
  259. Perceptual losses for real-time style transfer and super-resolution
    image image image
  260. Personalizing dialogue agents: I have a dog, do you have pets too?
    image image image
  261. Phase-functioned neural networks for character control
    image image image
  262. Playing Atari with deep reinforcement learning
    image image
  263. Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing
    image image image image image
  264. Prefix-tuning: optimizing continuous prompts for generation
    image image image image
  265. Probabilistic latent semantic analysis
    image image image
  266. Progressive growing of GANs from improved quality, stability and variation
    image image image image
  267. Prompting with pseudo-code instructions
    image image image
  268. Proximal policy optimization algorithms
    image image
  269. PullNet: open domain question answering with iterative retrieval on knowledge bases and text
    image image image
  270. PyTorch trace analysis for the masses
    image image image
  271. Q-BERT: Hessian based ultra low precision quantization of BERT
    image image image
  272. R3Net: recurrent residual refinement network for saliency detection
    image image image
  273. Reading Wikipedia to answer open-domain questions
    image image image
  274. REALM: Retrieval-augmented language model pretraining
    image image image image
  275. Recurrent models of visual attention
    image
  276. Reducing activation recomputation in large transformer models
    image image image image image
  277. Regularizing and optimizing LSTM language models
    image image image image
  278. Reinforcement Learning: An Introduction
    image
  279. ReLoRA: high-rank training through low-rank updates
    image image image
  280. Restricted Boltzmann machines for collaborative filtering
    image image image image image image
  281. Retrieval augmentation reduces hallucination in conversation
    image image image image image
  282. Retrieval-augmented generation for knowledge-intensive NLP tasks
    image image image image image
  283. Revisiting classifier two-sample tests
    image image image
  284. RoBERTa: a robustly optimized BERT pretraining approach
    image image image
  285. RoFormer: enhanced transformer with rotary position embedding
    image image image image
  286. SantaCoder: don't reach for the stars!
    image image image image image
  287. Scaling instruction-finetuned language models
    image image image
  288. Scaling PyTorch FSDP for training foundation Models on IBM cloud
    image image image image
  289. Scaling transformer to 1M tokens and beyond with RMT
    image image image
  290. Self-instruct: aligning language model with self generated instructions
    image image image
  291. Self-normalizing neural networks
    image image image image
  292. Semantically equivalent adversarial rules for debugging NLP models
    image image
  293. Seq2seq model and the exposure bias problem
    image image image
  294. Sequence parallelism: long sequence training from system perspective
    image image image image image
  295. Sequential latent knowledge selection for knowledge-grounded dialogue
    image image image image
  296. Simple and effective multi-paragraph reading comprehension
    image image image
  297. Simplifying transformer blocks
    image image image
  298. SlimPajama-DC: understanding data combinations for LLM training
    image image image
  299. SmoothQuant: accurate and efficient post-training quantization for large language models
    image image image image image
  300. Soft filter pruning for accelerating deep convolutional neural networks
    image image image
  301. SOLAR 10.7B: scaling large language models with simple yet effective depth up-scaling
    image image image image
  302. SOLOIST: building task bots at scale with transfer learning and machine teaching
    image image image image image
  303. Solving quantitative reasoning problems with language models
    image image image image
  304. Spatial temporal graph convolutional networks for skeleton-based action recognition
    image image image image
  305. Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting
    image image image image image
  306. Spectral normalization for generative adversarial networks
    image image image
  307. Speech and language processing
    image
  308. StarCoder: may the source be with you!
    image image image image image
  309. Sticking the landing: simple, lower-variance gradient estimators for variational inference
    image image image image
  310. StitchNet: composing neural networks from pre-trained fragments
    image image image
  311. Stochastic hyperparameter optimization through hypernetworks
    image image image
  312. Strategies for teaching layered networks classification tasks
    image image
  313. Structured prompting: scaling in-context learning to 1,000 examples
    image image image
  314. Style transfer from non-parallel text by cross-alignment
    image image image
  315. Subword regularization: improving neural network translation models with multiple subword candidates
    image image image image
  316. Supervised learning of probability distributions by neural networks
    image image
  317. Supporting efficient large model training on AMD InstinctTM GPUs with DeepSpeed
    image image image image
  318. Switch transformers: scaling to trillion parameter models with simple and efficient sparsity
    image image image
  319. Synchronization in neural nets
    image image
  320. Synthetic data (almost) from scratch: generalized instruction tuning for language models
    image image image image
  321. Tackling the poor assumptions of Naive Bayes text classifiers
    image image
  322. Tensor programs V: tuning large neural networks via zero-shot hyperparameter transfer
    image image image
  323. TextWorld: a learning environment for text-based games
    image image image
  324. The best of both worlds: combining recent advances in neural machine translation
    image image image
  325. The elements of statistical learning: data mining, inference and prediction
    image
  326. The Flan collection: designing data and methods for effective instruction tuning
    image image image
  327. The information bottleneck method
    image image
  328. The Pile: an 800GB dataset of diverse text for language modeling
    image image image
  329. The power of scale for parameter-efficient prompt tuning
    image image image
  330. The wisdom of hindsight makes language models better instruction followers
    image image image
  331. Thermometer encoding: one hot way to resist adversarial examples
    image image image image
  332. To regularize or not to regularize? The bias variance trade-off in regularized AEs
    image image image
  333. Towards crowdsourced training of large neural networks using decentralized mixture-of-experts
    image image image image
  334. Towards deep learning models resilient to adversarial attacks
    image image image image
  335. Towards evaluating the robustness of neural networks
    image image image image
  336. Train short, test long: Attention with linear biases enables input length extrapolation
    image image image image
  337. Training compute-optimal large language models
    image image image image
  338. Training language models to follow instructions with human feedback
    image image image image
  339. Transformer memory as a differentiable search index
    image image image image
  340. Transformer quality in linear time
    image image image
  341. Transformer-XL: attentive language models beyond a fixed-length context
    image image image image
  342. Transformers explained visually (part 1): overview of functionality
    image image image
  343. Transformers explained visually (part 2): how it works, step-by-step
    image image image
  344. Transformers explained visually (part 3): multi-head attention, deep dive
    image image image
  345. Turing-NLG: a 17-billion-parameter language model by Microsoft
    image image image image
  346. UL2: unifying language learning paradigms
    image image image image
  347. Understanding convolutional neural networks with a mathematical model
    image image
  348. Understanding disentangling in β-VAE
    image image image image
  349. Understanding the Open Pre-Trained Transformers (OPT) library
    image image image
  350. Unit tests for stochastic optimization
    image image image
  351. Universal language model fine-tuning for text classification
    image image image
  352. Unlimiformer: long-range transformers with unlimited length input
    image image image image
  353. Unpaired image-to-image translation using cycle-consistent adversarial networks
    image image image image
  354. Unsupervised machine translation using monolingual corpora only
    image image image image
  355. Unsupervised representation learning by predicting image rotations
    image image image
  356. Using DeepSpeed and Megatron to train Megatron-Turing NLG 530B, the world’s largest and most powerful generative language model
    image image image image image
  357. Variational inference using implicit distributions
    image image image
  358. Variational inference with latent space quantization for adversarial resilience
    image image image image image
  359. Variational learning for unsupervised knowledge grounded dialogs
    image image image image
  360. Variational lossy autoencoder
    image image image
  361. Vector-quantized input-contextualized soft prompts for natural language understanding
    image image image
  362. VEEGAN: reducing mode collapse in GANs using implicit variational learning
    image image image image
  363. Very deep convolutional networks for large-scale image recognition
    image image image
  364. Visual instruction tuning
    image image image image
  365. Visualizing data using t-SNE
    image image image
  366. Wasserstein GAN
    image image image image
  367. wav2vec 2.0: a framework for self-supervised learning of speech representations
    image image image image
  368. Wavenet: a generative model for raw audio
    image image image
  369. WebGPT: browser-assisted question-answering with human feedback
    image image image image
  370. What language model to train if you have one million GPU hours?
    image image image image image
  371. Will GPT-4 run DOOM?
    image image image
  372. Word translation without parallel data
    image image image
  373. Yandex publishes YaLM 100B. It’s the largest GPT-like neural network in open source
    image image image image
  374. You only cache once: decoder-decoder architectures for language models
    image image
  375. You only look once: unified, real-time object detection
    image image image
  376. ZeRO & DeepSpeed: new system optimizations enable training models with over 100 billion parameters
    image image image image
  377. ZeRO++: Extremely efficient collective communication for giant model training
    image image image image image
  378. ZeRO-2 & DeepSpeed: shattering barriers of deep learning speed & scale
    image image image image
  379. ZeRO-Infinity: breaking the GPU memory wall for extreme scale deep learning
    image image image image image
  380. Zero-shot text-to-image generation
    image image image image
  381. ZeRO: memory optimizations toward training trillion parameter models
    image image image image image
  382. ZeroQuant: efficient and affordable post-training quantization for large-scale transformers
    image image image
  383. β-VAE: learning basic visual concepts with a constrained variational framework
    image image image

Calculus

  1. Calculus of variations
    image
  2. Thomas' calculus
    image

Computer Architecture

  1. Accelerated computing with a reconfigurable dataflow architecture
    image image image
  2. Computer architecture: a quantitative approach
    image
  3. Computer organization and design ARM edition: the hardware software interface
    image
  4. Flipping bits in memory without accessing them: an experimental study of DRAM disturbance errors
    image image image image
  5. Improving DRAM performance by parallelizing refreshes with accesses
    image image image image image
  6. Memory performance attacks: denial of memory service in multi-core systems
    image image image image
  7. Memory scaling: a systems architecture perspective
    image image image
  8. Millicode in an IBM zSeries processor
    image image
  9. MTIA v1: Meta's first-generation AI inference accelerator
    image image image
  10. RAIDR: Retention-Aware Intelligent DRAM Refresh
    image image image
  11. Stall-time fair memory access scheduling for chip multiprocessors
    image image image

Computer Graphics

  1. Principles of traditional animation applied to 3D computer animation
    image image image

Data Structures and Algorithms

  1. Data structures and algorithms in Java
    image
  2. Introduction to algorithms
    image

Digital Electronics

  1. Digital design: with an introduction to the Verilog HDL
    image

Graph Theory

  1. Introduction to graph theory
    image

Information Theory

  1. Elements of information theory
    image
  2. Error detecting and error correcting codes
    image image image image

Linear Algebra

  1. Linear algebra and its applications
    image
  2. Matrix analysis and applied linear algebra
    image
  3. The matrix cookbook
    image

Measure Theory

  1. Measure theory
    image

Optimization Theory

  1. Convex Optimization
    image
  2. Distributed optimization and statistical learning via the alternating direction method of multipliers
    image

Probability and Stochastic Processes

  1. Introduction to probability and stochastic processes with applications
    image

Quantum Computing

  1. A fast quantum mechanical algorithm for database search
    ![image][Paper] ![image][Quantum Algorithms] ![image][Quantum Computing]
  2. A single quantum cannot be cloned
    ![image][Paper] ![image][Quantum Computing]
  3. Can quantum-mechanical description of physical reality be considered complete
    ![image][Paper] ![image][Quantum Computing]
  4. Image recognition with an adiabatic quantum computer I. mapping to quadratic unconstrained binary optimization
    ![image][Paper] ![image][Image Classification] ![image][QUBO] ![image][Quantum Computing]
  5. Integer optimization toolbox (minimizing polynomials over integer lattices using quantum annealing)
    ![image][Whitepaper]
  6. Limits on parallel speedup for classical Ising model solvers
    ![image][Whitepaper]
  7. Partitioning optimization problems for hybrid classical/quantum execution
    ![image][Whitepaper]
  8. Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer
    ![image][Paper] ![image][Quantum Algorithms] ![image][Quantum Computing]
  9. Probabilistic cloning and identification of linearly independent quantum states
    ![image][Paper] ![image][Cloning] ![image][Quantum Computing]
  10. Programming with D-Wave: map coloring problem
    ![image][Whitepaper]
  11. Quantum computation and quantum information
    ![image][Book]
  12. Quantum computing: a gentle introduction
    ![image][Book]
  13. Quantum performance evaluation: a short reading list
    ![image][Whitepaper]
  14. Quantum theory, the Church-Turing principle and the universal quantum computer
    ![image][Paper] ![image][Quantum Computing] ![image][Theory of Computation]
  15. Rapid solution of problems by quantum computation
    ![image][Paper] ![image][Quantum Algorithms] ![image][Quantum Computing]
  16. Teleporting an unknown quantum state via dual classical and Einstein-Podolsky-Rosen channels
    ![image][Paper] ![image][Quantum Computing] ![image][Quantum Teleportation]

Signal Processing

  1. Discrete-time signal processing
    ![image][Book]
  2. Foundations of Signal Processing
    ![image][Book]
  3. Signals and systems
    ![image][Book]
  4. Understanding digital signal processing
    ![image][Book]