/ml-papers

My collection of machine learning papers

MIT LicenseMIT

ML Papers

Reviews

  1. 191210 최근 논문들에 대한 생각
  2. 200323 최근 논문들에 대한 생각
  3. 200326 최근 논문들에 대한 생각
  4. 200403 최근 논문들에 대한 생각
  5. 200411 최근 논문들에 대한 생각
  6. 200708 최근 논문들에 대한 생각
  7. 200717 최근 논문들에 대한 생각
  8. 200726 최근 논문들에 대한 생각
  9. 200802 최근 논문들에 대한 생각
  10. 201118 최근 논문들에 대한 생각
  11. 201120 최근 논문들에 대한 생각
  12. 201125 최근 논문들에 대한 생각
  13. 201126 최근 논문들에 대한 생각 1
  14. 201126 최근 논문들에 대한 생각 2
  15. 201204 최근 논문들에 대한 생각
  16. 210121 최근 논문들에 대한 생각
  17. 210121 최근 논문들에 대한 생각
  18. 210305 최근 논문들에 대한 생각
  19. 210319 최근 논문들에 대한 생각
  20. 210323 최근 논문들에 대한 생각
  21. 210326 최근 논문들에 대한 생각
  22. 210403 최근 논문들에 대한 생각
  23. 210412 최근 논문들에 대한 생각
  24. 210424 최근 논문들에 대한 생각
  25. 210429 최근 논문들에 대한 생각
  26. 210430 최근 논문들에 대한 생각 1
  27. 210430 최근 논문들에 대한 생각
  28. 210505 최근 논문들에 대한 생각
  29. 210508 최근 논문들에 대한 생각
  30. 230222 LLM 필요 데이터셋에 대한 리뷰

Table of contents

  1. 3d generative model
  2. activation
  3. active learning
  4. adaptation
  5. adapter
  6. adversarial training
  7. alignment
  8. antialiasing
  9. asr
  10. attention
  11. audio generation
  12. audio source separation
  13. augmentation
  14. autoregressive model
  15. backbone
  16. bayesian
  17. benchmark
  18. bert
  19. bias
  20. calibration
  21. causality
  22. channel attention
  23. chat
  24. classificiation
  25. clip
  26. computation
  27. continual learning
  28. contrastive learning
  29. convolution
  30. dataset
  31. ddpm
  32. decoding
  33. deep prior
  34. detr
  35. dewarping
  36. dialog
  37. differentiable operator
  38. differentiable tree
  39. discrete vae
  40. disentangle
  41. distillation
  42. distributed training
  43. domain adaptation
  44. dropout
  45. efficiency
  46. efficient attention
  47. efficient training
  48. embedding
  49. end2end
  50. energy based model
  51. ensemble
  52. federated learning
  53. few shot
  54. finetuning
  55. flow
  56. fpn
  57. gan
  58. gan inversion
  59. generalization
  60. generative model
  61. graph
  62. hallucination
  63. hypernetwork
  64. hyperparameter
  65. identifiability
  66. image editing
  67. image generation
  68. img2img
  69. implicit model
  70. implicit representation
  71. in context learning
  72. instance segmentation
  73. instruct
  74. interpolation
  75. knowledge base
  76. language generation
  77. language model
  78. layout
  79. lightweight
  80. line
  81. linear attention
  82. llm
  83. lm
  84. local attention
  85. loss
  86. loss surface
  87. matting
  88. memory
  89. meta learning
  90. metric
  91. metric learning
  92. mixture of experts
  93. mixup
  94. mlm
  95. mlops
  96. moe
  97. multilingual
  98. multimodal
  99. multimodal generation
  100. multitask
  101. nas
  102. nerf
  103. neural computer
  104. neural ode
  105. neural rendering
  106. nlp
  107. nmt
  108. non autoregressive
  109. norm free
  110. normalization
  111. object detection
  112. ocr
  113. open set recognition
  114. optimization
  115. optimizer
  116. oriented object detection
  117. out of distribution
  118. panoptic segmentation
  119. perceptual loss
  120. point cloud
  121. pooling
  122. pose
  123. positional encoding
  124. practice
  125. pretraining
  126. probabilistic model
  127. prompt
  128. pruning
  129. qa
  130. quantization
  131. reasoning
  132. recommender
  133. regularization
  134. reinforcement learning
  135. rendering
  136. representation
  137. resampling
  138. restoration
  139. retrieval
  140. review
  141. rl
  142. robustness
  143. saliency
  144. salient object detection
  145. scale
  146. score
  147. self supervised
  148. self supervised discovery
  149. semantic factor
  150. semantic segmentation
  151. semi supervised learning
  152. seq2seq
  153. sgld
  154. singing voice synthesis
  155. single image
  156. speech
  157. state space model
  158. structure learning
  159. style transfer
  160. stylegan
  161. super resolution
  162. table
  163. text generation
  164. text2img
  165. tokenizer
  166. topic model
  167. topology
  168. tracking
  169. training
  170. transducer
  171. transfer
  172. transformer
  173. tropical geometry
  174. tts
  175. uncertainty
  176. unsupervised img2img
  177. unsupervised nmt
  178. vae
  179. video
  180. video transformer
  181. vision
  182. vision language
  183. vision transformer
  184. visual grounding
  185. vit
  186. vocoder
  187. vq
  188. vqa
  189. weak supervision
  190. yolo
  191. uncategorized

3d generative model

  1. 211220 3D-aware Image Synthesis via Learning Structural and Textural Representations
  2. 220615 GRAM-HD
  3. 220621 EpiGRAF
  4. 221126 AvatarGen
  5. 230209 In-N-Out #gan_inversion
  6. 230216 3D-aware Conditional Image Synthesis
  7. 230302 3D generation on ImageNet
  8. 230627 Free-style and Fast 3D Portrait Synthesis
  9. 230630 Magic123

activation

  1. 201019 Smooth activations and reproducibility in deep networks #stability

active learning

  1. 200630 Similarity Search for Efficient Active Learning and Search of Rare
  2. 210729 Batch Active Learning at Scale

adaptation

  1. 200129 Side-Tuning
  2. 200130 Once for All #deploy

adapter

  1. 210608 Compacter
  2. 220524 AdaMix #moe

adversarial training

  1. 200130 Adversarial Examples Improve Image Recognition
  2. 200625 Smooth Adversarial Training

alignment

  1. 230504 Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
  2. 230517 LeTI #prompt
  3. 230517 SLiC-HF
  4. 230518 LIMA
  5. 230526 Training Socially Aligned Language Models in Simulated Human Society
  6. 230529 Direct Preference Optimization
  7. 230607 How Far Can Camels Go
  8. 230625 Is RLHF More Difficult than Standard RL #rl
  9. 230628 Towards Measuring the Representation of Subjective Global Opinions in Language Models
  10. 230630 Preference Ranking Optimization for Human Alignment
  11. 230705 Jailbroken
  12. 230711 Secrets of RLHF in Large Language Models Part I #reinforcement_learning
  13. 230717 AlpaGasus
  14. 230720 FLASK #benchmark
  15. 230727 Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
  16. 230727 PanGu-Coder2
  17. 230731 ToolLLM
  18. 230801 Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models
  19. 230807 TPTU
  20. 230808 Shepherd

antialiasing

  1. 201120 An Effective Anti-Aliasing Approach for Residual Networks
  2. 201128 Truly shift-invariant convolutional neural networks

asr

  1. 200220 Imputer #non-autoregressive #ctc
  2. 200506 RNN-T Models Fail to Generalize to Out-of-Domain Audio #transducer #out_of_distribution #domain #regularization
  3. 200510 Listen Attentively, and Spell Once #non-autoregressive
  4. 200516 Large scale weakly and semi-supervised learning for low-resource video ASR #weak_supervision #semi_supervised_learning
  5. 200516 Reducing Spelling Inconsistencies in Code-Switching ASR using #ctc
  6. 200516 Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition #non-autoregressive
  7. 200518 Attention-based Transducer for Online Speech Recognition #transducer
  8. 200518 Iterative Pseudo-Labeling for Speech Recognition
  9. 200519 Distilling Knowledge from Ensembles of Acoustic Models for Joint CTC-Attention End-to-End Speech Recognition #ctc
  10. 200519 Improved Noisy Student Training for Automatic Speech Recognition #semi_supervised_learning
  11. 200729 Developing RNN-T Models Surpassing High-Performance Hybrid Models with #rnn_t
  12. 201021 FastEmit #transducer #decoding
  13. 201027 CASS-NAT #non-autoregressive
  14. 201125 Streaming end-to-end multi-talker speech recognition #transducer
  15. 210524 Unsupervised Speech Recognition #unsupervised_training
  16. 210608 SpeechBrain
  17. 211012 Word Order Does Not Matter For Speech Recognition #weak_supervision
  18. 211030 Pseudo-Labeling for Massively Multilingual Speech Recognition #semi_supervised_learning #multilingual
  19. 211210 Building a great multi-lingual teacher with sparsely-gated mixture of experts for speech recognition #moe
  20. 220829 A Language Agnostic Multilingual Streaming On-Device ASR System #multilingual
  21. 220922 Whisper
  22. 230302 Google USM #multilingual

attention

  1. 200122 Object Contextual Representations #semantic_segmentation
  2. 200129 Empirical Attention
  3. 200130 Axial Attention #generative_model
  4. 200130 Criss-Cross Attention #semantic_segmentation
  5. 200212 Capsules with Inverted Dot-Product Attention Routing #capsule
  6. 200219 Tree-structured Attention with Hierarchical Accumulation #parse
  7. 200226 Sparse Sinkhorn Attention #sparse_attention
  8. 200317 Axial-DeepLab #panoptic_segmentation
  9. 200404 Neural Architecture Search for Lightweight Non-Local Networks
  10. 200421 Attention is Not Only a Weight #bert
  11. 200423 Self-Attention Attribution #bert
  12. 200428 Exploring Self-attention for Image Recognition
  13. 200510 CTC-synchronous Training for Monotonic Attention Model #asr #ctc
  14. 200516 Streaming Transformer-based Acoustic Models Using Self-attention with Augmented Memory #asr #memory
  15. 200519 Normalized Attention Without Probability Cage
  16. 200519 Staying True to Your Word
  17. 200626 Object-Centric Learning with Slot Attention
  18. 201119 On the Dynamics of Training Attention Models #training
  19. 210223 Linear Transformers Are Secretly Fast Weight Memory Systems #linear_attention #efficient_attention
  20. 210225 LazyFormer #bert
  21. 210517 Pay Attention to MLPs #mlp
  22. 210524 Self-Attention Networks Can Process Bounded Hierarchical Languages #nlp
  23. 210826 Train Short, Test Long #positional_encoding

audio generation

  1. 220220 It's Raw! Audio Generation with State-Space Models
  2. 230126 MusicLM
  3. 230208 Noise2Music

audio source separation

  1. 211019 The Cocktail Fork Problem

augmentation

  1. 200122 FixMatch #semi_supervised_learning #manifold #mixup
  2. 200220 Affinity and Diversity
  3. 200621 AdvAug #mixup #nlp #adversarial_training
  4. 200710 Meta-Learning Requires Meta-Augmentation #metalearning
  5. 201117 Sequence-Level Mixed Sample Data Augmentation #nlp
  6. 201213 Simple Copy-Paste is a Strong Data Augmentation Method for Instance #instance_segmentation
  7. 201214 Improving Panoptic Segmentation at All Scales #panoptic_segmentation
  8. 210318 AlignMix #mixup
  9. 210318 TrivialAugment
  10. 210429 Ensembling with Deep Generative Views #ensemble #gan_inversion
  11. 220830 Augraphy

autoregressive model

  1. 200129 Semi Autorgressive Training
  2. 201027 Scaling Laws for Autoregressive Generative Modeling #scale
  3. 211216 Characterizing and addressing the issue of oversmoothing in neural autoregressive sequence modeling
  4. 220622 Scaling Autoregressive Models for Content-Rich Text-to-Image Generation #image_generation
  5. 230202 Accelerating Large Language Model Decoding with Speculative Sampling #decoding

backbone

  1. 190724 MixNet #convolution
  2. 200123 Antialiasing #invariance
  3. 200128 Attentive Normalization
  4. 200128 IBN-Net
  5. 200128 Selective Kernel
  6. 200128 SpineNet
  7. 200128 Squeeze-Excitation
  8. 200128 Switchable Normalization
  9. 200128 Switchable Whitening
  10. 200129 Assembled Techniques #regularization
  11. 200129 DenseNet
  12. 200129 Dual Path Networks
  13. 200129 HarDNet
  14. 200129 PyramidNet
  15. 200129 SelecSLS
  16. 200129 ShuffleNet V2 #efficiency
  17. 200129 VoVNet
  18. 200130 FishNet
  19. 200130 HRNet
  20. 200130 MixConv #convolution
  21. 200330 Designing Network Design Spaces #hypernetwork
  22. 200330 TResNet #antialiasing
  23. 200419 ResNeSt
  24. 200630 Deep Isometric Learning for Visual Recognition #normalization #resnet #cnn #norm_free
  25. 200712 PSConv #cnn #multiscale
  26. 201015 HS-ResNet #multiscale
  27. 201221 FcaNet #channel_attention
  28. 210226 Transformer in Transformer #vision_transformer
  29. 210304 Barlow Twins #self_supervised #contrastive_learning
  30. 210310 Involution #convolution #attention
  31. 210312 Revisiting ResNets #resnet
  32. 210317 Learning to Resize Images for Computer Vision Tasks #resizing
  33. 210331 EfficientNetV2
  34. 210408 SI-Score #robustness #vision_transformer
  35. 210505 RepMLP #mlp
  36. 210506 Do You Even Need Attention #mlp
  37. 210510 ResMLP #mlp
  38. 210617 Layer Folding #efficiency #pruning
  39. 210628 Early Convolutions Help Transformers See Better #cnn #vit
  40. 210718 AS-MLP #mlp
  41. 210726 Contextual Transformer Networks for Visual Recognition
  42. 211014 Non-deep Networks
  43. 211018 HRFormer #vit
  44. 211227 Augmenting Convolutional networks with attention-based aggregation #vit #cnn
  45. 220110 A ConvNet for the 2020s #cnn #vit
  46. 220313 Scaling Up Your Kernels to 31x31
  47. 220318 Three things everyone should know about Vision Transformers #vit
  48. 220728 HorNet #cnn
  49. 230302 Image as Set of Points

bayesian

  1. 200207 Bayes Posterior
  2. 200210 Liberty or Depth #mean_field
  3. 200514 Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors #ensemble #variational_inference

benchmark

  1. 230720 SciBench
  2. 230807 AgentBench

bert

  1. 200305 What the [MASK]
  2. 200405 FastBERT #distillation #lightweight
  3. 200408 DynaBERT #distillation #pruning
  4. 200412 XtremeDistil #distillation #lightweight
  5. 200427 DeeBERT #lightweight
  6. 200518 Audio ALBERT #audio #representation
  7. 200601 Amnesic Probing
  8. 200608 On the Stability of Fine-tuning BERT #finetuning
  9. 200610 Revisiting Few-sample BERT Fine-tuning #finetuning
  10. 210906 An Empirical Study on Few-shot Knowledge Probing for Pretrained Language Models #few_shot #knowledge_base #prompt
  11. 210907 Beyond Preserved Accuracy #lightweight #distillation

bias

  1. 200519 Identifying Statistical Bias in Dataset Replication
  2. 201202 Learning from others' mistakes #product_of_experts
  3. 220919 The Biased Artist #image_generation
  4. 230731 KoBBQ

calibration

  1. 200221 Calibrating Deep Neural Networks using Focal Loss #loss
  2. 200223 Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks #bayesian
  3. 200620 Regression Prior Networks
  4. 210730 Soft Calibration Objectives for Neural Networks

causality

  1. 200518 An Analysis of the Adaptation Speed of Causal Models

channel attention

  1. 200129 GCNet

chat

  1. 200630 PLATO-2 #text_gen #chatbot

classificiation

  1. 220107 Generalized Category Discovery #open_set_recognition

clip

  1. 230515 Improved baselines for vision-language pre-training

computation

  1. 200213 Training Large Neural Networks with Constant Memory using a New Execution Algorithm
  2. 201204 Nimble

continual learning

  1. 201124 Energy-Based Models for Continual Learning #energy_based_model
  2. 211103 One Pass ImageNet #online_learning

contrastive learning

  1. 200213 A Simple Framework for Contrastive Learning of Visual Representations #augmentation
  2. 200309 Improved Baselines with Momentum Contrastive Learning
  3. 200311 Improved Baselines with Momentum Contrastive Learning #review
  4. 200423 Supervised Contrastive Learning #metric_learning
  5. 200511 Prototypical Contrastive Learning of Unsupervised Representations
  6. 200520 What Makes for Good Views for Contrastive Learning
  7. 200613 Bootstrap your own latent
  8. 200630 Debiased Contrastive Learning
  9. 200730 Contrastive Learning for Unpaired Image-to-Image Translation #img2img
  10. 200803 LoCo
  11. 201020 BYOL works even without batch statistics
  12. 201109 Towards Domain-Agnostic Contrastive Learning #mixup #multimodal
  13. 201116 AdCo #adversarial_training
  14. 201117 Dense Contrastive Learning for Self-Supervised Visual Pre-Training
  15. 201119 Heterogeneous Contrastive Learning
  16. 201119 Propagate Yourself
  17. 201121 Run Away From your Teacher
  18. 201123 Boosting Contrastive Self-Supervised Learning with False Negative
  19. 201126 Beyond Single Instance Multi-view Unsupervised Representation Learning #self_supervised #mixup
  20. 201126 How Well Do Self-Supervised Models Transfer #self_supervised #transfer
  21. 201127 Self-EMD
  22. 201201 Towards Good Practices in Self-supervised Representation Learning #self_supervised
  23. 201204 Seed the Views #mixup
  24. 201212 Contrastive Learning for Label-Efficient Semantic Segmentation #semantic_segmentation
  25. 201221 Online Bag-of-Visual-Words Generation for Unsupervised Representation #self_supervised #discrete_vae
  26. 201226 Spatial Contrastive Learning for Few-Shot Classification #few_shot #attention
  27. 210324 A Broad Study on the Transferability of Visual Representations with Contrastive Learning #review
  28. 210325 Contrasting Contrastive Self-Supervised Representation Learning Models #review
  29. 210325 Rethinking Self-Supervised Learning #training
  30. 210405 An Empirical Study of Training Self-Supervised Vision Transformers #vision_transformer
  31. 210426 Multimodal Contrastive Training for Visual Representation Learning #multimodal
  32. 210429 A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning #video
  33. 210429 Emerging Properties in Self-Supervised Vision Transformers #saliency #vision_transformer #representation
  34. 210429 With a Little Help from My Friends #knn
  35. 210510 Self-Supervised Learning with Swin Transformers #vision_transformer
  36. 210511 VICReg
  37. 210512 When Does Contrastive Visual Representation Learning Work #self_supervised #transfer #review
  38. 210517 Divide and Contrast #self_supervised #dataset #distillation
  39. 210601 Exploring the Diversity and Invariance in Yourself for Visual Pre-Training Task
  40. 211018 Understanding Dimensional Collapse in Contrastive Self-supervised Learning
  41. 220701 e-CLIP #vision-language #retrieval
  42. 220727 Contrastive Masked Autoencoders are Stronger Vision Learners #self_supervised #mlm
  43. 220804 Fine-Grained Semantically Aligned Vision-Language Pre-Training #vision-language
  44. 221017 Non-Contrastive Learning Meets Language-Image Pre-Training #clip
  45. 230327 Sigmoid Loss for Language Image Pre-Training #clip
  46. 230414 DINOv2
  47. 230418 Hyperbolic Image-Text Representations #clip #vision-language
  48. 230501 What Do Self-Supervised Vision Transformers Learn #self_supervised #mlm
  49. 230627 CLIPA-v2 #vision-language #multimodal

convolution

  1. 200316 SlimConv
  2. 210429 Decoupled Dynamic Filter Networks
  3. 230221 Hyena Hierarchy #state_space_model

dataset

  1. 200218 DivideMix #mixup #noise #semi_supervised_learning
  2. 200509 Building a Manga Dataset
  3. 201130 Image Quality Assessment for Perceptual Image Restoration #score
  4. 201201 Weakly-Supervised Arbitrary-Shaped Text Detection with #ocr #weak_supervision
  5. 210601 Comparing Test Sets with Item Response Theory
  6. 210907 Datasets
  7. 210927 PASS
  8. 211103 LAION-400M
  9. 220704 How Much More Data Do I Need
  10. 230220 Poisoning Web-Scale Training Datasets is Practical
  11. 230317 On the De-duplication of LAION-2B #clip
  12. 230428 CCpdf

ddpm

  1. 200619 Denoising Diffusion Probabilistic Models
  2. 201126 Score-Based Generative Modeling through Stochastic Differential #generative_model
  3. 201214 Learning Energy-Based Models by Diffusion Recovery Likelihood #energy_based_model
  4. 210302 Fixing Data Augmentation to Improve Adversarial Robustness #augmentation #generative_model
  5. 210305 Fixing Data Augmentation to Improve Adversarial Robustness 2 #robustness #augmentation #generative_model
  6. 210506 DiffSinger #singing_voice_synthesis
  7. 210511 Diffusion Models Beat GANs on Image Synthesis
  8. 210528 Gotta Go Fast When Generating Data with Score-Based Models
  9. 210531 On Fast Sampling of Diffusion Probabilistic Models
  10. 210607 Learning to Efficiently Sample from Diffusion Probabilistic Models
  11. 210610 Cascaded Diffusion Models for High Fidelity Image Generation
  12. 210610 Score-based Generative Modeling in Latent Space
  13. 210612 D2C
  14. 210701 Variational Diffusion Models
  15. 210802 SDEdit
  16. 210819 ImageBART #vq #autoregressive_model
  17. 211129 Blended Diffusion for Text-driven Editing of Natural Images #clip #image_editing
  18. 211130 Diffusion Autoencoders
  19. 211220 GLIDE #multimodal
  20. 211220 High-Resolution Image Synthesis with Latent Diffusion Models #vae #vq
  21. 220201 Progressive Distillation for Fast Sampling of Diffusion Models #distillation
  22. 220316 Dual Diffusion Implicit Bridges for Image-to-Image Translation
  23. 220524 Imagen #conditional_generative_model
  24. 220601 Elucidating the Design Space of Diffusion-Based Generative Models
  25. 220803 Pyramidal Denoising Diffusion Probabilistic Models
  26. 220808 Analog Bits
  27. 220912 Blurring Diffusion Models
  28. 220912 Soft Diffusion
  29. 220929 DreamFusion #3d_generative_model
  30. 221017 Imagic #image_editing
  31. 221018 Differentially Private Diffusion Models
  32. 221102 eDiffi #text2img
  33. 221115 Versatile Diffusion #vae
  34. 221117 Null-text Inversion for Editing Real Images using Guided Diffusion Models #image_editing
  35. 221118 Magic3D #3d_generative_model #text2img #nerf
  36. 221120 Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models #text2img
  37. 221124 Fast Sampling of Diffusion Models via Operator Learning
  38. 230126 On the Importance of Noise Scheduling for Diffusion Models
  39. 230126 simple diffusion
  40. 230131 Attend-and-Excite #text2img
  41. 230205 Design Booster #image_editing
  42. 230206 Zero-shot Image-to-Image Translation #image_editing
  43. 230207 Long Horizon Temperature Scaling #calibration #lm
  44. 230208 Q-Diffusion #quantization
  45. 230212 I$^2$SB #sde #image_restoration
  46. 230215 PRedItOR #image_editing
  47. 230216 MultiDiffusion #image_editing
  48. 230220 Composer #image_editing
  49. 230221 Diffusion Models and Semi-Supervised Learners Benefit Mutually with Few Labels #semi_supervised_learning #self_supervised
  50. 230221 On Calibrating Diffusion Probabilistic Models
  51. 230223 Controlled and Conditional Text to Image Generation with Diffusion Prior
  52. 230227 ELITE #text2img
  53. 230301 Unlimited-Size Diffusion Restoration #image_restoration
  54. 230302 Consistency Models #generative_model
  55. 230307 TRACT #distillation
  56. 230309 Cones #image_editing
  57. 230316 $P+$ #text2img
  58. 230316 Efficient Diffusion Training via Min-SNR Weighting Strategy
  59. 230320 SVDiff #image_editing
  60. 230405 Generative Novel View Synthesis with 3D-Aware Diffusion Models #nerf
  61. 230405 Taming Encoder for Zero Fine-tuning Image Customization with Text-to-Image Diffusion Models
  62. 230406 Diffusion Models as Masked Autoencoders #representation
  63. 230406 InstantBooth #image_editing
  64. 230501 In-Context Learning Unlocked for Diffusion Models #few_shot #text2img
  65. 230515 Common Diffusion Noise Schedules and Sample Steps are Flawed
  66. 230529 RAPHAEL
  67. 230601 StyleDrop #style_transfer
  68. 230706 Censored Sampling of Diffusion Models Using 3 Minutes of Human Feedback
  69. 230707 SDXL #text2img
  70. 230710 AnimateDiff

decoding

  1. 200516 Layer-Wise Cross-View Decoding for Sequence-to-Sequence Learning
  2. 200601 Cascaded Text Generation with Markov Transformers #text_generation
  3. 210608 FastSeq

deep prior

  1. 200408 Deep Manifold Prior

detr

  1. 201201 MaX-DeepLab #panoptic_segmentation #end2end
  2. 210813 Conditional DETR for Fast Training Convergence
  3. 211202 Masked-attention Mask Transformer for Universal Image Segmentation #panoptic_segmentation
  4. 220726 Group DETR #efficient_training
  5. 230803 DETR Doesn't Need Multi-Scale or Locality Design #multiscale

dewarping

  1. 211025 DocTr
  2. 211028 DocScanner

dialog

  1. 200129 Meena #NLP
  2. 210715 Beyond Goldfish Memory
  3. 220120 LaMDA

differentiable operator

  1. 200220 Fast Differentiable Sorting and Ranking

differentiable tree

  1. 200218 The Tree Ensemble Layer

discrete vae

  1. 200518 Robust Training of Vector Quantized Bottleneck Models

disentangle

  1. 200130 ID-GAN #GAN
  2. 200130 MixNMatch #conditional_generative_model
  3. 200515 Face Identity Disentanglement via Latent Space Mapping

distillation

  1. 200129 Learning by Cheating
  2. 200209 Understanding and Improving Knowledge Distillation
  3. 200210 Subclass Distillation
  4. 200219 Knapsack Pruning with Inner Distillation #pruning #lightweight
  5. 200221 Residual Knowledge Distillation
  6. 200309 Knowledge distillation via adaptive instance normalization #normalization
  7. 200521 Why distillation helps #calibration
  8. 200629 An EM Approach to Non-autoregressive Conditional Sequence Generation #non-autoregressive
  9. 200701 Go Wide, Then Narrow #lightweight
  10. 200702 Interactive Knowledge Distillation
  11. 210726 Text is Text, No Matter What #multitask

distributed training

  1. 210510 GSPMD
  2. 230121 SuperScaler

domain adaptation

  1. 200526 Keep it Simple

dropout

  1. 200701 On Dropout, Overfitting, and Interaction Effects in Deep Neural Networks

efficiency

  1. 230130 Alternating Updates for Efficient Transformers
  2. 230530 Blockwise Parallel Transformer for Long Context Large Models
  3. 230624 H$_2$O
  4. 230705 SkipDecode
  5. 230728 Skeleton-of-Thought

efficient attention

  1. 200410 Longformer
  2. 200412 ProFormer
  3. 200605 Masked Language Modeling for Proteins via Linearly Scalable Long-Context
  4. 200608 Linformer
  5. 210324 Finetuning Pretrained Transformers into RNNs
  6. 210505 Beyond Self-attention
  7. 210510 Poolingformer
  8. 210603 Luna
  9. 210623 Stable, Fast and Accurate
  10. 210705 Long-Short Transformer #local_attention
  11. 210712 Combiner #sparse_attention #local_attention
  12. 210725 H-Transformer-1D
  13. 211210 Self-attention Does Not Need $O(n^2)$ Memory
  14. 220527 FlashAttention
  15. 220726 DETRs with Hybrid Matching #detr
  16. 220911 On The Computational Complexity of Self-Attention
  17. 220921 Mega
  18. 230317 CoLT5
  19. 230705 LongNet
  20. 230706 Focused Transformer

efficient training

  1. 230216 Decoupled Model Schedule for Deep Learning Training #distributed_training
  2. 230711 Stack More Layers Differently
  3. 230712 No Train No Gain
  4. 230807 LoRA-FA

embedding

  1. 200424 All Word Embeddings from One Embedding
  2. 200717 A Unifying Perspective on Neighbor Embeddings along the
  3. 210907 Rare Words Degenerate All Words

end2end

  1. 200605 End-to-End Adversarial Text-to-Speech #tts
  2. 200608 FastSpeech 2 #tts
  3. 201106 Wave-Tacotron #tts
  4. 210716 Autonomy 2.0
  5. 211215 SPTS

energy based model

  1. 200504 How to Train Your Energy-Based Model for Regression

ensemble

  1. 200217 BatchEnsemble

federated learning

  1. 210415 See through Gradients

few shot

  1. 200228 AdarGCN #graph
  2. 210608 Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks #adapter #multitask
  3. 210910 LibFewShot
  4. 220715 Plex #uncertainty #generalization

finetuning

  1. 200214 AutoLR #pruning
  2. 200426 Masking as an Efficient Alternative to Finetuning for Pretrained
  3. 200709 Sample-based Regularization #transfer
  4. 230428 Empirical Analysis of the Strengths and Weaknesses of PEFT Techniques for LLMs

flow

  1. 200220 Regularized Autoencoders via Relaxed Injective Probability Flow
  2. 200227 Woodbury Transformations for Deep Generative Flows

fpn

  1. 200122 CARAFE #resampling
  2. 200129 Mixture FPN
  3. 200506 Scale-Equalizing Pyramid Convolution for Object Detection
  4. 201201 Dynamic Feature Pyramid Networks for Object Detection
  5. 201202 Dual Refinement Feature Pyramid Networks for Object Detection
  6. 201202 Parallel Residual Bi-Fusion Feature Pyramid Network for Accurate
  7. 201225 Implicit Feature Pyramid Network for Object Detection #equilibrium_model #implicit_model

gan

  1. 170629 Do GANs actually learn the distribution
  2. 191022 MelGAN #tts
  3. 200129 Adversarial Lipschitz Regularization
  4. 200129 GAN generalization metric
  5. 200129 OneGAN
  6. 200130 AttentionGAN #attention #img2img
  7. 200130 Evaluation metrics of GAN #metric #evaluation #generative_model
  8. 200130 Local GAN #attention
  9. 200130 Noise Robust GAN #robustness
  10. 200130 Small-GAN
  11. 200130 Smoothness and Stability in GANs
  12. 200206 Unbalanced GANs #vae
  13. 200210 Unsupervised Discovery of Interpretable Directions in the GAN Latent #semantic_factor
  14. 200211 Improved Consistency Regularization for GANs #augmentation #consistency_regularization
  15. 200211 Smoothness and Stability in GANs #regularization
  16. 200212 Image-to-Image Translation with Text Guidance #multimodal #multimodal_generation #img2img
  17. 200212 Real or Not Real, that is the Question
  18. 200214 Top-k Training of GANs #regularization
  19. 200220 The Benefits of Pairwise Discriminators for Adversarial Training #regularization
  20. 200223 GANHopper #img2img
  21. 200224 When Relation Networks meet GANs #regularization
  22. 200225 Freeze the Discriminator #finetuning #transfer
  23. 200226 On Leveraging Pretrained GANs for Generation with Limited Data #finetuning #transfer
  24. 200227 Topology Distance #topology #score
  25. 200228 A U-Net Based Discriminator for Generative Adversarial Networks
  26. 200304 Creating High Resolution Images with a Latent Adversarial Generator #generative_model #super_resolution
  27. 200308 Perceptual Image Super-Resolution with Progressive Adversarial Network #super_resolution
  28. 200312 Your GAN is Secretly an Energy-based Model and You Should use Discriminator Driven Latent Sampling #energy_based_model #sampling
  29. 200317 Blur, Noise, and Compression Robust Generative Adversarial Networks #noise
  30. 200318 OpenGAN #metric_learning
  31. 200325 Improved Techniques for Training Single-Image GANs #single_image
  32. 200326 Image Generation Via Minimizing Fréchet Distance in Discriminator Feature Space
  33. 200402 Controllable Orthogonalization in Training DNNs #regularization
  34. 200404 Feature Quantization Improves GAN Training #discrete_vae
  35. 200405 Discriminator Contrastive Divergence
  36. 200407 Inclusive GAN
  37. 200408 Attentive Normalization for Conditional Image Generation #attention
  38. 200504 Transforming and Projecting Images into Class-conditional Generative #generative_model
  39. 200518 Unconditional Audio Generation with Generative Adversarial Networks and Cycle Regularization #audio_generation
  40. 200519 CIAGAN
  41. 200519 Regularization Methods for Generative Adversarial Networks #review #regularization
  42. 200604 Image Augmentations for GAN Training #augmentation
  43. 200611 Training Generative Adversarial Networks with Limited Data #augmentation
  44. 200618 Differentiable Augmentation for Data-Efficient GAN Training #augmentation
  45. 200618 Diverse Image Generation via Self-Conditioned GANs #generative_model
  46. 200630 PriorGAN
  47. 200708 InfoMax-GAN #regularization
  48. 200713 Closed-Form Factorization of Latent Semantics in GANs #semantic_factor
  49. 200729 Instance Selection for GANs
  50. 200729 VocGAN #vocoder
  51. 200730 Rewriting a Deep Generative Model
  52. 200804 Open-Edit #image_editing
  53. 200807 Improving the Speed and Quality of GAN by Adversarial Training #robustness
  54. 201028 Training Generative Adversarial Networks by Solving Ordinary #neural_ode
  55. 201109 Learning Semantic-aware Normalization for Generative Adversarial Networks #normalization
  56. 201109 Towards a Better Global Loss Landscape of GANs #training
  57. 201118 Style Intervention #semantic_factor
  58. 201124 Adversarial Generation of Continuous Images #implicit_representation
  59. 201125 How to train your conditional GAN #img2img #generative_model
  60. 201125 Omni-GAN #generative_model
  61. 201127 Image Generators with Conditionally-Independent Pixel Synthesis #implicit_representation
  62. 201201 Refining Deep Generative Models via Discriminator Gradient Flow #sampling
  63. 201201 pi-GAN #implicit_representation
  64. 201203 Self-labeled Conditional GANs #unsupervised_training
  65. 201204 A Note on Data Biases in Generative Models #bias #generative_model
  66. 201208 You Only Need Adversarial Supervision for Semantic Image Synthesis #img2img
  67. 210227 Ultra-Data-Efficient GAN Training #augmentation #few_shot
  68. 210317 Training GANs with Stronger Augmentations via Contrastive Discriminator #contrastive_learning #augmentation
  69. 210318 Drop the GAN #single_image #generative_model #patch
  70. 210330 Dual Contrastive Loss and Attention for GANs #contrastive_learning
  71. 210401 Partition-Guided GANs
  72. 210407 Regularizing Generative Adversarial Networks under Limited Data #regularization
  73. 210408 InfinityGAN
  74. 210413 DatasetGAN #few_shot
  75. 210413 Few-shot Image Generation via Cross-domain Correspondence #img2img #generative_model #few_shot
  76. 210414 Aligning Latent and Image Spaces to Connect the Unconnectable
  77. 210415 GANcraft #nerf
  78. 210422 On Buggy Resizing Libraries and Surprising Subtleties in FID Calculation #antialiasing
  79. 210426 EigenGAN #semantic_factor
  80. 210608 Data-Efficient Instance Generation from Instance Discrimination #contrastive_learning
  81. 210614 Improved Transformer for High-Resolution GANs #transformer #efficient_training
  82. 210623 Alias-Free Generative Adversarial Networks #antialiasing
  83. 210910 Instance-Conditioned GAN
  84. 210927 WarpedGANSpace
  85. 211017 AE-StyleGAN #gan_inversion
  86. 211101 Projected GANs Converge Faster
  87. 211215 Efficient Geometry-aware 3D Generative Adversarial Networks #nerf
  88. 211216 GRAM #3d_generative_model #nerf
  89. 220201 StyleGAN-XL
  90. 220219 Truncated Diffusion Probabilistic Models #generative_model #ddpm
  91. 220224 Self-Distilled StyleGAN
  92. 220311 The Role of ImageNet Classes in Fréchet Inception Distance
  93. 220314 InsetGAN for Full-Body Image Generation #pose
  94. 220414 Any-resolution Training for High-resolution Image Synthesis
  95. 230123 StyleGAN-T #text2img
  96. 230309 Scaling up GANs for Text-to-Image Synthesis #text2img

gan inversion

  1. 200330 Exploiting Deep Generative Prior for Versatile Image Restoration and #perceptual_loss
  2. 200331 In-Domain GAN Inversion for Real Image Editing
  3. 200703 Collaborative Learning for Faster StyleGAN Embedding
  4. 200803 Encoding in Style #stylegan
  5. 220223 Near Perfect GAN Inversion

generalization

  1. 200130 Fantastic Generalization Measures
  2. 200225 Rethinking Bias-Variance Trade-off for Generalization of Neural Networks

generative model

  1. 190325 Implicit Generative and Generalization in Energy-Based Models #energy_based_model
  2. 200129 Controlling Generative Model
  3. 200129 Deep Automodulator
  4. 200129 Frechet Joint Distance
  5. 200129 Spot CNN generated image
  6. 200130 BIVA
  7. 200130 Glow #flow
  8. 200130 IGEBM #energy_based_model
  9. 200130 Neural Spline Flows #flow
  10. 200130 VQ-VAE-2 #autoregressive_model
  11. 200217 Augmented Normalizing Flows #flow
  12. 200313 Semantic Pyramid for Image Generation #perceptual_loss #image_editing
  13. 200616 Improved Techniques for Training Score-Based Generative Models #ncsn
  14. 201117 DeepNAG
  15. 201202 Improved Contrastive Divergence Training of Energy Based Models #energy_based_model
  16. 201204 Few-shot Image Generation with Elastic Weight Consolidation #few_shot #continual_learning
  17. 201209 Positional Encoding as Spatial Inductive Bias in GANs #positional_encoding
  18. 201224 Soft-IntroVAE #vae
  19. 210223 Zero-Shot Text-to-Image Generation #discrete_vae #autoregressive_model #multimodal
  20. 210318 Few-shot Semantic Image Synthesis Using StyleGAN Prior #stylegan #few_shot
  21. 210824 SimVLM #vision-language
  22. 211015 MaGNET #sampling
  23. 220208 MaskGIT #autoregressive_model #non-autoregressive #vq

graph

  1. 200129 Multi-Graph Transformer

hallucination

  1. 210413 The Curious Case of Hallucinations in Neural Machine Translation #mt

hypernetwork

  1. 200722 WeightNet #channel_attention

hyperparameter

  1. 200425 Learning to Guide Random Search
  2. 200521 HyperSTAR

identifiability

  1. 200701 On Linear Identifiability of Learned Representations

image editing

  1. 200515 Semantic Photo Manipulation with a Generative Image Prior
  2. 200702 Deep Single Image Manipulation #single_image #img2img
  3. 201123 HistoGAN
  4. 201127 Navigating the GAN Parameter Space for Semantic Image Editing #semantic_factor
  5. 210318 Using latent space regression to analyze and leverage compositionality
  6. 220531 IDE-3D #3d_generative_model
  7. 220802 An Image is Worth One Word
  8. 220802 Prompt-to-Prompt Image Editing with Cross Attention Control
  9. 230202 Dreamix #video
  10. 230213 3D-aware Blending with Generative NeRFs #3d_generative_model
  11. 230626 DragDiffusion
  12. 230626 Localized Text-to-Image Generation for Free via Cross Attention Control #text2img
  13. 230705 DragonDiffusion

image generation

  1. 200426 Disentangled Image Generation Through Structured Noise Injection

img2img

  1. 200130 FUNIT
  2. 200305 SketchyCOCO
  3. 200315 GMM-UNIT #multimodal_generation
  4. 200319 High-Resolution Daytime Translation Without Domain Labels
  5. 200330 Semi-supervised Learning for Few-shot Image-to-Image Translation #semi_supervised_learning #few_shot
  6. 200406 Rethinking Spatially-Adaptive Normalization #lightweight
  7. 200409 TuiGAN #few_shot #single_image
  8. 200419 TriGAN #domain_adaptation
  9. 200709 Improving Style-Content Disentanglement in Image-to-Image Translation #disentangle
  10. 200714 COCO-FUNIT
  11. 200715 Transformation Consistency Regularization- A Semi-Supervised Paradigm #augmentation #semi_supervised_learning
  12. 200723 TSIT
  13. 200724 The Surprising Effectiveness of Linear Unsupervised Image-to-Image
  14. 201203 CoCosNet v2 #patch #pose
  15. 201205 Spatially-Adaptive Pixelwise Networks for Fast Image Translation #implicit_representation

implicit model

  1. 200615 Multiscale Deep Equilibrium Models

implicit representation

  1. 211026 NeRV
  2. 211122 Neural Fields in Visual Computing and Beyond
  3. 220117 Instant Neural Graphics Primitives with a Multiresolution Hash Encoding
  4. 220522 ReLU Fields
  5. 230202 Factor Fields

in context learning

  1. 220520 Prototypical Calibration for Few-shot Learning of Language Models
  2. 220522 Instruction Induction
  3. 230613 TART

instance segmentation

  1. 200129 BlendMask
  2. 200129 COCO 2018 Instance Segmentation #challenge
  3. 200129 Deep Snake
  4. 200130 PointRend
  5. 200311 Conditional Convolutions for Instance Segmentation
  6. 200313 PointINS #dynamic_conv
  7. 200722 Deep Variational Instance Segmentation
  8. 200730 LevelSet R-CNN
  9. 201119 DCT-Mask
  10. 201119 Unifying Instance and Panoptic Segmentation with Dynamic Rank-1 #panoptic_segmentation #dynamic_conv
  11. 201126 The Devil is in the Boundary
  12. 201129 End-to-End Video Instance Segmentation with Transformers #end2end #detr #video
  13. 201203 BoxInst #dataset #weak_supervision
  14. 210503 ISTR #end2end
  15. 210505 QueryInst #end2end
  16. 210604 SOLQ
  17. 210713 Per-Pixel Classification is Not All You Need for Semantic Segmentation #panoptic_segmentation #semantic_segmentation #detr
  18. 221110 OneFormer #semantic_segmentation #panoptic_segmentation #detr

instruct

  1. 230131 The Flan Collection
  2. 230210 The Wisdom of Hindsight Makes Language Models Better Instruction Followers #reinforcement_learning
  3. 230406 Instruction Tuning with GPT-4
  4. 230704 Instruction Tuning Review

interpolation

  1. 200804 Autoencoder Image Interpolation by Shaping the Latent Space
  2. 211018 Learning in High Dimension Always Amounts to Extrapolation #extrapolation

knowledge base

  1. 200214 Scalable Neural Methods for Reasoning With a Symbolic Knowledge Base

language generation

  1. 200712 Do You Have the Right Scissors
  2. 200729 Mirostat

language model

  1. 200128 Scaling Laws for LM
  2. 200205 K-Adapter #multitask #adapter
  3. 200206 Consistency of a Recurrent Language Model With Respect to Incomplete #decoding #hallucination #language_generation
  4. 200222 Training Question Answering Models From Synthetic Data #qa #bert
  5. 200225 MiniLM #distillation #lightweight
  6. 200406 Sparse Text Generation #language_generation #sampling
  7. 200427 Recall and Learn #finetuning #continual_learning
  8. 200505 Stolen Probability
  9. 200516 MicroNet for Efficient Language Modeling #lightweight
  10. 200518 Contextual Embeddings
  11. 201015 Fine-Tuning Pre-trained Language Model with Weak Supervision #transfer #weak_supervision
  12. 201023 Rethinking embedding coupling in pre-trained language models #regularization
  13. 201201 How Can We Know When Language Models Know #qa #calibration
  14. 201228 Universal Sentence Representation Learning with Conditional Masked #sentence_embedding #mlm
  15. 210216 Non-Autoregressive Text Generation with Pre-trained Language Models #non-autoregressive #text_generation
  16. 210318 GPT Understands, Too #finetuning #prompt
  17. 210407 Revisiting Simple Neural Probabilistic Language Models
  18. 210420 Carbon Emissions and Large Neural Network Training #nlp
  19. 210922 Recursively Summarizing Books with Human Feedback #summarization

layout

  1. 210601 Incorporating Visual Layout Structures for Scientific Text Classification
  2. 210902 Skim-Attention
  3. 220418 LayoutLMv3
  4. 220517 MATrIX -- Modality-Aware Transformer for Information eXtraction
  5. 220912 PreSTU
  6. 220918 ERNIE-mmLayout

lightweight

  1. 200624 Neural Architecture Design for GPU-Efficient Networks
  2. 201124 MicroNet
  3. 210507 Pareto-Optimal Quantized ResNet Is Mostly 4-bit #quantization
  4. 220409 Searching for Efficient Neural Architectures for On-Device ML on Edge TPUs

line

  1. 210601 Towards Real-time and Light-weight Line Segment Detection

linear attention

  1. 230717 Retentive Network #recurrent

llm

  1. 220521 Scaling Laws and Interpretability of Learning from Repeated Data
  2. 220522 Memorization Without Overfitting
  3. 220524 Large Language Models are Zero-Shot Reasoners #prompt
  4. 220711 Exploring Length Generalization in Large Language Models
  5. 220711 Language Models (Mostly) Know What They Know
  6. 220926 Can Large Language Models Truly Understand Prompts
  7. 220929 Compositional Semantic Parsing with Large Language Models #semantic_parsing
  8. 221017 Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them #prompt #reasoning
  9. 221020 Transcending Scaling Laws with 0.1% Extra Compute #mlm
  10. 221103 Inverse scaling can become U-shaped #prompt
  11. 221109 BLOOM
  12. 221109 Efficiently Scaling Transformer Inference #efficiency
  13. 221118 PAL #prompt
  14. 221118 SmoothQuant #quantization
  15. 230124 A Watermark for Large Language Models
  16. 230126 DetectGPT
  17. 230131 Faithful Chain-of-Thought Reasoning #prompt
  18. 230131 Grounding Language Models to Images for Multimodal Generation #multimodal_generation #vision-language
  19. 230131 Large Language Models Can Be Easily Distracted by Irrelevant Context #in_context_learning
  20. 230206 Chain of Hindsight Aligns Language Models with Feedback #alignment
  21. 230209 Toolformer
  22. 230211 Characterizing Attribution and Fluency Tradeoffs for Retrieval-Augmented Large Language Models #retrieval
  23. 230215 Learning Performance-Improving Code Edits #in_context_learning
  24. 230215 The Capacity for Moral Self-Correction in Large Language Models #instruct #ethics
  25. 230216 Pretraining Language Models with Human Preferences #instruct #alignment
  26. 230219 Semantic Uncertainty #uncertainty
  27. 230221 ChatGPT #instruct
  28. 230224 Check Your Facts and Try Again #retrieval
  29. 230306 PaLM-E #robotics #multimodal #3d
  30. 230307 Flamingo #multimodal
  31. 230307 Larger language models do in-context learning differently #in_context_learning
  32. 230307 The BigScience ROOTS Corpus #dataset
  33. 230313 High-throughput Generative Inference of Large Language Models with a Single GPU
  34. 230315 A Comprehensive Study on Post-Training Quantization for Large Language Models #quantization
  35. 230316 ART #in_context_learning #prompt
  36. 230317 GPTs are GPTs
  37. 230322 MEGA #multilingual
  38. 230322 RepoCoder #retrieval
  39. 230322 Sparks of Artificial General Intelligence
  40. 230407 Generative Agents
  41. 230410 Inference with Reference #efficiency
  42. 230411 RRHF #alignment
  43. 230416 Sabiá #multilingual
  44. 230417 A Comparative Study between Full-Parameter and LoRA-based Fine-Tuning on Chinese Instruction Data for Instruction Following Large Language Model #instruct
  45. 230417 Low-code LLM #prompt
  46. 230418 UniMax #multilingual
  47. 230419 A Theory on Adam Instability in Large-Scale Machine Learning #optimizer
  48. 230421 Can GPT-4 Perform Neural Architecture Search #nas
  49. 230421 Inducing anxiety in large language models increases exploration and bias
  50. 230424 Why we need RLHF #alignment #rl
  51. 230428 Causal Reasoning and Large Language Models #causality
  52. 230428 Speak, Memory
  53. 230503 Cheaply Evaluating Inference Efficiency Metrics for Autoregressive Transformer APIs #efficiency
  54. 230504 Can LLM Already Serve as A Database Interface
  55. 230509 Large Language Model Programs #prompt
  56. 230509 MoT #prompt
  57. 230511 Chain-of-Dictionary Prompting Elicits Translation in Large Language Models
  58. 230511 INGENIOUS #dataset
  59. 230511 Not All Languages Are Created Equal in LLMs
  60. 230513 CodeT5+
  61. 230515 Symbol tuning improves in-context learning in language models #prompt
  62. 230516 Towards Expert-Level Medical Question Answering with Large Language Models
  63. 230517 DoReMi #dataset #multitask #pretraining
  64. 230517 Searching for Needles in a Haystack #nmt #multilingual
  65. 230519 Cross-Lingual Supervision improves Large Language Models Pre-training #nmt #multilingual
  66. 230521 A PhD Student's Perspective on Research in NLP in the Era of Very Large Language Models #nlp
  67. 230522 How Language Model Hallucinations Can Snowball #alignment
  68. 230522 To Repeat or Not To Repeat
  69. 230523 Aligning Large Language Models through Synthetic Feedback #alignment
  70. 230523 Goat
  71. 230523 QLoRA #quantization #alignment #finetuning
  72. 230525 Scaling Data-Constrained Language Models #scaling
  73. 230526 Large Language Models as Tool Makers #alignment
  74. 230614 WizardCoder
  75. 230615 Exploring the MIT Mathematics and EECS Curriculum Using Large Language Models
  76. 230615 Inverse Scaling
  77. 230616 Demystifying GPT Self-Repair for Code Generation
  78. 230616 Full Parameter Fine-tuning for Large Language Models with Limited Resources #finetuning
  79. 230619 BayLing #alignment
  80. 230620 Learning to Generate Better Than Your LLM #alignment
  81. 230620 Textbooks Are All You Need
  82. 230621 Deep Language Networks
  83. 230622 AudioPaLM #audio #speech
  84. 230623 Bring Your Own Data! Self-Supervised Evaluation for Large Language Models #evaluation
  85. 230623 GKD #distillation
  86. 230624 Beyond Scale #dataset
  87. 230628 Towards Language Models That Can See #multimodal #vision-language
  88. 230629 Benchmarking Large Language Model Capabilities for Conditional Generation #evaluation
  89. 230630 Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting
  90. 230705 Reasoning or Reciting #evaluation
  91. 230706 Style Over Substance #evaluation
  92. 230713 In-context Autoencoder for Context Compression in a Large Language Model
  93. 230717 GEAR #alignment
  94. 230803 Scaling Relationship on Learning Mathematical Reasoning with Large Language Models

lm

  1. 210524 StructuralLM #layout
  2. 210524 True Few-Shot Learning with Language Models #few_shot
  3. 210528 ByT5
  4. 210617 LoRA #adapter #finetuning
  5. 210623 Charformer #tokenizer
  6. 210714 Deduplicating Training Data Makes Language Models Better #corpus
  7. 210714 HTLM
  8. 210811 DEMix Layers #mixture_of_experts
  9. 210813 Curriculum Learning #curriculum
  10. 210816 On the Opportunities and Risks of Foundation Models
  11. 210902 Do Prompt-Based Models Really Understand the Meaning of their Prompts #prompt
  12. 210903 Finetuned Language Models Are Zero-Shot Learners #zero-shot
  13. 210908 A Recipe For Arbitrary Text Style Transfer with Large Language Models #prompt
  14. 211011 Unsupervised Neural Machine Translation with Generative Language Models Only #unsupervised_nmt
  15. 211015 Multitask Prompted Training Enables Zero-Shot Task Generalization #zero-shot
  16. 211016 Invariant Language Modeling #irm
  17. 211016 MarkupLM #layout
  18. 211016 Sharpness-Aware Minimization Improves Language Model Generalization #regularization
  19. 211020 Shaking the foundations #causality
  20. 211027 Training Verifiers to Solve Math Word Problems
  21. 211213 GLaM #moe
  22. 211220 Efficient Large Scale Language Modeling with Mixtures of Experts #mixture_of_experts
  23. 220210 Red Teaming Language Models with Language Models #safety
  24. 220213 A Contrastive Framework for Neural Text Generation #decoding
  25. 220215 General-purpose, long-context autoregressive modeling with Perceiver AR #efficient_attention #autoregressive_model
  26. 220314 Efficient Language Modeling with Sparse all-MLP #mlp
  27. 220329 Training Compute-Optimal Large Language Models
  28. 220413 METRO
  29. 220414 GPT-NeoX-20B
  30. 220502 OPT
  31. 220524 On the Role of Bidirectionality in Language Model Pre-Training #bert
  32. 220728 Efficient Training of Language Models to Fill in the Middle #mlm
  33. 220805 Branch-Train-Merge #product_of_experts #ensemble
  34. 220805 Few-shot Learning with Retrieval Augmented Language Model #retrieval #few_shot
  35. 221110 The CRINGE Loss #safety
  36. 230131 In-Context Retrieval-Augmented Language Models #retrieval
  37. 230503 CodeGen2
  38. 230526 MixCE
  39. 230612 Gradient Ascent Post-training Enhances Language Model Generalization

local attention

  1. 210323 Scaling Local Self-Attention for Parameter Efficient Visual Backbones

loss

  1. 200712 It Is Likely That Your Loss Should be a Likelihood

loss surface

  1. 210225 Loss Surface Simplexes for Mode Connecting Volumes and Fast Ensembling

matting

  1. 200401 Background Matting
  2. 201123 Is a Green Screen Really Necessary for Real-Time Portrait Matting

memory

  1. 200206 Product Kanerva Machines

meta learning

  1. 200221 Learning to Continually Learn #continual_learning
  2. 200312 Online Fast Adaptation and Knowledge Accumulation
  3. 200401 Editable Neural Networks
  4. 200706 Meta-Learning Symmetries by Reparameterization #group_equivariance

metric

  1. 211025 The Efficiency Misnomer
  2. 230307 Is ChatGPT a Good NLG Evaluator

metric learning

  1. 200319 A unifying mutual information view of metric learning

mixture of experts

  1. 220202 Unified Scaling Laws for Routed Language Models
  2. 230220 TA-MoE
  3. 230310 Towards MoE Deployment
  4. 230311 A Novel Tensor-Expert Hybrid Parallelism Approach to Scale Mixture-of-Experts Training
  5. 230324 Scaling Expert Language Models with Unsupervised Domain Discovery
  6. 230524 Mixture-of-Experts Meets Instruction Tuning

mixup

  1. 201220 ResizeMix
  2. 211228 LINDA #interpolation

mlm

  1. 200424 Probabilistically Masked Language Model Capable of Autoregressive Generation in Arbitrary Word Order #language_generation
  2. 210502 Larger-Scale Transformers for Multilingual Masked Language Modeling #multilingual #scale
  3. 220216 Should You Mask 15% in Masked Language Modeling
  4. 220715 Position Prediction as an Effective Pretraining Strategy #unsupervised_training
  5. 220929 Bidirectional Language Models Are Also Few-shot Learners #in_context_learning
  6. 221006 XDoc #layoutlm
  7. 221114 EVA #clip
  8. 230204 Representation Deficiency in Masked Language Modeling

mlops

  1. 230203 PyGlove

moe

  1. 230802 From Sparse to Soft Mixtures of Experts

multilingual

  1. 200207 A Multilingual View of Unsupervised Machine Translation #nmt
  2. 211015 Breaking Down Multilingual Machine Translation #nmt
  3. 220512 Lifting the Curse of Multilinguality by Pre-training Modular Transformers #adapter #mixture_of_experts
  4. 230219 Scaling Laws for Multilingual Neural Machine Translation #nmt #scaling
  5. 230406 On the Pareto Front of Multilingual Neural Machine Translation #multitask #scaling
  6. 230611 Language Versatilists vs. Specialists

multimodal

  1. 200401 Pixel-BERT
  2. 200513 INFOTABS
  3. 200514 Behind the Scene
  4. 201130 Multimodal Pretraining Unmasked
  5. 210928 VideoCLIP #video_transformer #retrieval
  6. 211103 An Empirical Study of Training End-to-End Vision-and-Language Transformers #vision-language
  7. 220512 A Generalist Agent #reinforcement_learning
  8. 220527 GIT
  9. 230110 Scaling Laws for Generative Mixed-Modal Language Models
  10. 230123 Zorro #video #audio
  11. 230201 mPLUG-2
  12. 230202 Multimodal Chain-of-Thought Reasoning in Language Models #vision-language
  13. 230304 Prismer #vision-language
  14. 230308 Visual ChatGPT #chatgpt
  15. 230507 X-LLM
  16. 230511 Musketeer (All for One, and One for All) #vision-language #multitask
  17. 230513 On the Hidden Mystery of OCR in Large Multimodal Models #vision-language
  18. 230529 PaLI-X #vision-language
  19. 230613 Image Captioners Are Scalable Vision Learners Too #vision-language
  20. 230626 Kosmos-2 #vision-language

multimodal generation

  1. 211122 L-Verse
  2. 211124 NÜWA

multitask

  1. 200508 Transforming task representations to perform novel tasks #continual_learning
  2. 200625 MTAdam
  3. 210825 Multi-Task Self-Training for Learning General Representations
  4. 220520 UViM
  5. 230207 Exploring the Benefits of Training Expert Language Models over Instruction Tuning #instruct
  6. 230705 Flacuna

nas

  1. 200324 BigNAS
  2. 200326 Are Labels Necessary for Neural Architecture Search #unsupervised_training
  3. 200406 Network Adjustment
  4. 200412 FBNetV2
  5. 200428 Angle-based Search Space Shrinking for Neural Architecture Search
  6. 200506 Local Search is State of the Art for Neural Architecture Search
  7. 200507 Noisy Differentiable Architecture Search
  8. 200602 FBNetV3 #hyperparameter #training #swa
  9. 200720 NSGANetV2
  10. 220831 Efficient Sparsely Activated Transformers #moe

nerf

  1. 201014 NeRF++
  2. 201125 Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes
  3. 201127 D-NeRF
  4. 201203 Learned Initializations for Optimizing Coordinate-Based Neural #implicit_representation
  5. 201203 pixelNeRF
  6. 201215 Object-Centric Neural Scene Rendering
  7. 210225 IBRNet
  8. 210318 FastNeRF
  9. 210318 GNeRF
  10. 210318 MVSNeRF
  11. 210318 NeMI
  12. 210324 Mip-NeRF
  13. 210325 KiloNeRF
  14. 210325 PlenOctrees for Real-time Rendering of Neural Radiance Fields
  15. 210706 Depth-supervised NeRF
  16. 210809 NeuralMVS
  17. 211019 CIPS-3D #stylegan
  18. 211129 Deblur-NeRF
  19. 211129 HDR-NeRF
  20. 211129 Urban Radiance Fields
  21. 211210 CityNeRF
  22. 221010 NerfAcc
  23. 230204 AV-NeRF
  24. 230208 Nerfstudio
  25. 230413 Zip-NeRF #antialiasing
  26. 230503 3D Gaussian Splatting for Real-Time Radiance Field Rendering #neural_rendering

neural computer

  1. 200720 Distributed Associative Memory Network with Memory Refreshing Loss
  2. 211130 Show Your Work

neural ode

  1. 200207 How to train your neural ODE
  2. 200520 Neural Controlled Differential Equations
  3. 200708 Learning Differential Equations that are Easy to Solve

neural rendering

  1. 200226 Learning to Shadow Hand-drawn Sketches
  2. 200427 Neural Hair Rendering
  3. 200506 CONFIG
  4. 201116 Stylized Neural Painting
  5. 201119 Creative Sketch Generation
  6. 201130 Animating Pictures with Eulerian Motion Fields #single_image
  7. 210319 Paint by Word
  8. 210512 Enhancing Photorealism Enhancement
  9. 211013 ADOP
  10. 220728 Neural Strands

nlp

  1. 200518 (Re)construing Meaning in NLP
  2. 200715 Towards Debiasing Sentence Representations #bias
  3. 220826 What Do NLP Researchers Believe

nmt

  1. 200427 Lexically Constrained Neural Machine Translation with Levenshtein Transformer
  2. 200710 Learn to Use Future Information in Simultaneous Translation #simultaneous_translation
  3. 201224 Why Neural Machine Translation Prefers Empty Outputs #hallucination
  4. 230120 Is ChatGPT A Good Translator #chatgpt
  5. 230228 Large Language Models Are State-of-the-Art Evaluators of Translation Quality #metric

non autoregressive

  1. 200403 Aligned Cross Entropy for Non-Autoregressive Machine Translation
  2. 200415 Non-Autoregressive Machine Translation with Latent Alignments #nmt #ctc
  3. 200422 A Study of Non-autoregressive Model for Sequence Generation
  4. 201022 Parallel Tacotron #vae
  5. 201025 Improved Mask-CTC for Non-Autoregressive End-to-End ASR #ctc
  6. 201125 FBWave #vocoder #lightweight
  7. 201207 EfficientTTS #tts
  8. 211213 Step-unrolled Denoising Autoencoders for Text Generation
  9. 220520 Lossless Acceleration for Seq2seq Generation with Aggressive Decoding #efficiency
  10. 220909 Improved Masked Image Generation with Token-Critic #mlm
  11. 230301 StraIT #image_generation #vq
  12. 230516 SoundStorm #audio_generation

norm free

  1. 200310 ReZero is All You Need #initialization

normalization

  1. 200122 Group Norm, Weight Standardization
  2. 200122 Moving Average Batch Normalization
  3. 200122 StyleGAN 2 #GAN
  4. 200130 Rethinking Normalization
  5. 200130 Weight Standardization #weight
  6. 200224 Batch Normalization Biases Residual Blocks Towards the Identity Function #optimization #norm_free #initialization
  7. 200306 TaskNorm #meta_learning
  8. 200406 Evolving Normalization-Activation Layers #nas #activation
  9. 200427 A Batch Normalized Inference Network Keeps the KL Vanishing Away
  10. 201128 Batch Normalization with Enhanced Linear Transformation
  11. 211026 Revisiting Batch Normalization
  12. 230516 Exploring the Impact of Layer Normalization for Zero-shot Neural Machine Translation

object detection

  1. 191118 Anchor-Free
  2. 191118 CenterMask #instance_segmentation #backbone #1stage
  3. 191121 EfficientDet
  4. 200103 BlendMask #instance_segmentation #1stage
  5. 200122 SABL
  6. 200129 AP Loss #loss
  7. 200129 Backbone Reallocation for Detection #backbone #nas
  8. 200129 Dense RepPoints
  9. 200129 DetNAS #nas #backbone
  10. 200129 IOU-aware single stage detector #1stage
  11. 200130 ATSS #anchor #retinanet #fcos
  12. 200130 AutoAugment #augmentation #search
  13. 200130 EfficientDet #fpn
  14. 200130 Keypoint Triplet #keypoint
  15. 200130 Learning from Noisy Anchors
  16. 200130 Multiple Anchor Learning #anchor
  17. 200130 Objects as Points #keypoint
  18. 200130 Soft Anchor-Point #anchor
  19. 200211 Object Detection as a Positive-Unlabeled Problem #positive_unlabled #dataset
  20. 200212 Solving Missing-Annotation Object Detection with Background #dataset #noise
  21. 200218 Universal-RCNN #multi_dataset #graph
  22. 200316 Frustratingly Simple Few-Shot Object Detection #few_shot
  23. 200317 Revisiting the Sibling Head in Object Detector
  24. 200319 Revisiting the Sibling Head in Object Detector #review
  25. 200320 CentripetalNet #keypoint
  26. 200413 Dynamic R-CNN
  27. 200423 YOLOv4
  28. 200511 Scope Head for Accurate Localization in Object Detection
  29. 200526 End-to-End Object Detection with Transformers #end2end #matching
  30. 200603 DetectoRS
  31. 200611 Rethinking Pre-training and Self-training #semi_supervised_learning #transfer
  32. 200706 LabelEnc #distillation
  33. 200707 AutoAssign #anchor_free
  34. 200714 AQD #quantization
  35. 200715 Probabilistic Anchor Assignment with IoU Prediction for Object Detection #anchor #1stage
  36. 200716 RepPoints V2 #1stage #anchor_free
  37. 200723 PP-YOLO #tuning
  38. 200723 The Devil is in Classification #longtail
  39. 200727 Corner Proposal Network for Anchor-free, Two-stage Object Detection #anchor_free #2stage
  40. 201116 Scaled-YOLOv4
  41. 201118 End-to-End Object Detection with Adaptive Clustering Transformer #detr #end2end #efficiency
  42. 201121 Rethinking Transformer-based Set Prediction for Object Detection #detr #end2end #efficiency
  43. 201124 Sparse R-CNN
  44. 201128 Class-agnostic Object Detection
  45. 201207 End-to-End Object Detection with Fully Convolutional Network #end2end
  46. 201223 SWA Object Detection #swa
  47. 201227 Towards A Category-extended Object Detector without Relabeling or #continual_learning
  48. 210225 Simple multi-dataset detection #multi_dataset
  49. 210316 You Only Look One-level Feature
  50. 210325 USB #dataset
  51. 210417 TransVG #visual_grounding
  52. 210420 PP-YOLOv2 #yolo
  53. 210426 MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding #detr #visual_grounding
  54. 210601 You Only Look at One Sequence #vit
  55. 210615 Dynamic Head #attention
  56. 210718 YOLOX #yolo
  57. 210728 SimROD #domain_adaptation #self_supervised
  58. 210922 Pix2seq #detr #autoregressive_model
  59. 210929 Localizing Objects with Self-Supervised Transformers and no Labels #self_supervised #self_supervised_discovery #salient_object_detection
  60. 211101 PP-PicoDet #lightweight
  61. 211122 Benchmarking Detection Transfer Learning with Vision Transformers #unsupervised_training #vit
  62. 211123 Dynamic DETR
  63. 211129 Sparse DETR #detr
  64. 220107 Detecting Twenty-thousand Classes using Image-level Supervision #weak_supervision
  65. 220330 Exploring Plain Vision Transformer Backbones for Object Detection #vit #instance_segmentation
  66. 220615 A Unified Sequence Interface for Vision Tasks #multitask #instance_segmentation #keypoint

ocr

  1. 191231 LayoutLM
  2. 200217 Text Perceptron
  3. 210415 Rethinking Text Line Recognition Models
  4. 220107 Data-Efficient Information Extraction from Form-Like Documents #information_extraction
  5. 220328 Towards End-to-End Unified Scene Text Detection and Layout Analysis
  6. 220416 Pushing the Performance Limit of Scene Text Recognizer without Human Annotation

open set recognition

  1. 211012 Open-Set Recognition

optimization

  1. 200221 The Break-Even Point on Optimization Trajectories of Deep Neural Networks #loss #training
  2. 200224 The Early Phase of Neural Network Training
  3. 200227 Using a thousand optimization tasks to learn hyperparameter search strategies #optimizer #hyperparameter
  4. 200228 A Self-Tuning Actor-Critic Algorithm #reinforcement_learning #hyperparameter #meta_learning
  5. 200316 Weak and Strong Gradient Directions
  6. 200403 Gradient Centralization #training
  7. 200508 An Investigation of Why Overparameterization Exacerbates Spurious #training
  8. 200519 One Size Fits All

optimizer

  1. 200130 LAMB #large_batch
  2. 211006 8-bit Optimizers via Block-wise Quantization
  3. 221117 VeLO
  4. 230118 Learning-Rate-Free Learning by D-Adaptation
  5. 230213 Symbolic Discovery of Optimization Algorithms #search
  6. 230523 Sophia

oriented object detection

  1. 200129 Modulated Loss
  2. 200129 Oriented Objects as Middle Lines

out of distribution

  1. 200509 Generalizing Outside the Training Set
  2. 200519 Bridging the Gap Between Training and Inference for Spatio-Temporal Forecasting

panoptic segmentation

  1. 200129 Bridge gap of traininfer Panoptic Segmentation
  2. 200130 Panoptic-DeepLab
  3. 200218 Towards Bounding-Box Free Panoptic Segmentation #box_free
  4. 200404 Pixel Consensus Voting for Panoptic Segmentation
  5. 200421 Panoptic-based Image Synthesis #neural_rendering
  6. 201123 Scaling Wide Residual Networks for Panoptic Segmentation #scale
  7. 201201 Fully Convolutional Networks for Panoptic Segmentation #dynamic_conv
  8. 201202 Single-shot Path Integrated Panoptic Segmentation #dynamic_conv
  9. 210910 Panoptic Narrative Grounding #visual_grounding

perceptual loss

  1. 200206 Image Fine-grained Inpainting #inpainting
  2. 200515 Enhancing Perceptual Loss with Adversarial Feature Matching for Super-Resolution
  3. 200626 A Loss Function for Generative Neural Networks Based on Watson's
  4. 201223 Focal Frequency Loss for Image Reconstruction and Synthesis #loss

point cloud

  1. 220325 Point2Seq

pooling

  1. 200325 What Deep CNNs Benefit from Global Covariance Pooling
  2. 200330 Strip Pooling

pose

  1. 200729 Unselfie #inpainting
  2. 210913 Pose with Style

positional encoding

  1. 200628 Rethinking Positional Encoding in Language Pre-training
  2. 210408 Modulated Periodic Activations for Generalizable Local Functional #periodic_activation #implicit_representation
  3. 210506 ACORN #implicit_representation
  4. 210706 Rethinking Positional Encoding
  5. 230531 The Impact of Positional Encoding on Length Generalization in Transformers
  6. 230627 Extending Context Window of Large Language Models via Positional Interpolation

practice

  1. 210630 Using AntiPatterns to avoid MLOps Mistakes

pretraining

  1. 190620 XLNet #language_model
  2. 190729 RoBERTa #language_model
  3. 200128 mBART #machine_translation #nlp
  4. 200129 ImageBERT #multimodal
  5. 200129 LM Pretraining #nlp
  6. 200129 oLMpics #language_model #nlp
  7. 200130 ViLBERT #multimodal
  8. 200210 Pre-training Tasks for Embedding-based Large-scale Retrieval #retrieval
  9. 200217 Incorporating BERT into Neural Machine Translation #language_model #bert #nmt
  10. 200219 CodeBERT #bert
  11. 200228 UniLMv2 #language_model
  12. 200317 Calibration of Pre-trained Transformers #calibration
  13. 200405 Unsupervised Domain Clusters in Pretrained Language Models #domain
  14. 200412 Pre-training Text Representations as Meta Learning #meta_learning #finetuning
  15. 200413 Pretrained Transformers Improve Out-of-Distribution Robustness #out_of_distribution
  16. 200419 Are we pretraining it right #multimodal
  17. 200420 Adversarial Training for Large Neural Language Models #adversarial_training #language_model #finetuning
  18. 200420 MPNet #language_model
  19. 200423 Don't Stop Pretraining #domain
  20. 200427 LightPAFF #distillation #finetuning
  21. 200520 Pretraining with Contrastive Sentence Objectives Improves Discourse Performance of Language Models #contrastive_learning #sentence_embedding
  22. 200610 MC-BERT
  23. 200615 To Pretrain or Not to Pretrain #nlp #finetuning
  24. 200626 Pre-training via Paraphrasing #retrieval
  25. 200703 Language-agnostic BERT Sentence Embedding #embedding #multilingual
  26. 200713 An Empirical Study on Robustness to Spurious Correlations using #nlp #multitask
  27. 200715 InfoXLM #nlp #cross_lingual
  28. 200804 Taking Notes on the Fly Helps BERT Pre-training #nlp
  29. 201020 Pushing the Limits of Semi-Supervised Learning for Automatic Speech #semi_supervised_learning #asr
  30. 201021 Self-training and Pre-training are Complementary for Speech Recognition #self_supervised #asr
  31. 201022 mT5 #language_model #multilingual
  32. 201109 When Do You Need Billions of Words of Pretraining Data #language_model
  33. 201117 UP-DETR #detr #end2end #object_detection
  34. 201127 Progressively Stacking 2.0 #efficiency
  35. 201201 Pre-Trained Image Processing Transformer #contrastive_learning #vision_transformer #restoration
  36. 201201 StructFormer #parse #attention #mlm
  37. 201227 Syntax-Enhanced Pre-trained Model #language_model #syntax
  38. 210225 SparseBERT #attention #sparse_attention #bert
  39. 210318 All NLP Tasks Are Generation Tasks #language_model
  40. 210324 Can Vision Transformers Learn without Natural Images #vision_transformer
  41. 210402 Robust wav2vec 2.0 #asr
  42. 210407 Pushing the Limits of Non-Autoregressive Speech Recognition #non-autoregressive #asr #ctc
  43. 210413 Masked Language Modeling and the Distributional Hypothesis #language_model #mlm
  44. 210417 mT6 #language_model
  45. 210418 Data-Efficient Language-Supervised Zero-Shot Learning with #multimodal
  46. 210422 ImageNet-21K Pretraining for the Masses #backbone
  47. 210606 On the Effectiveness of Adapter-based Tuning for Pretrained Language Model Adaptation #finetuning #adapter
  48. 210606 Rethinking Training from Scratch for Object Detection #object_detection
  49. 210608 DETReg #detr
  50. 210614 SAS
  51. 210615 BEiT #vit #bert
  52. 210907 How much pretraining data do language models need to learn syntax #bert
  53. 210910 ReasonBERT #bert #reasoning #qa
  54. 210913 STraTA #finetuning #semi_supervised_learning #few_shot
  55. 210914 Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition #asr
  56. 210914 Task-adaptive Pre-training and Self-training are Complementary for Natural Language Understanding #finetuning #semi_supervised_learning #few_shot
  57. 210927 BigSSL #asr #semi_supervised_learning #unsupervised_training
  58. 211005 Exploring the Limits of Large Scale Pre-training #classificiation #scaling
  59. 211018 Unsupervised Finetuning #unsupervised_training #finetuning
  60. 211026 WavLM #speech
  61. 211103 VLMo #mixture_of_experts #vision-language
  62. 211111 Masked Autoencoders Are Scalable Vision Learners #vit
  63. 211122 ExT5 #multitask
  64. 211122 Florence #vision-language #transfer
  65. 211201 Revisiting the Transferability of Supervised Pretraining #transfer
  66. 211216 Masked Feature Prediction for Self-Supervised Visual Pre-Training #self_supervised
  67. 211220 Are Large-scale Datasets Necessary for Self-Supervised Pre-training #self_supervised #transfer
  68. 220429 Vision-Language Pre-Training for Boosting Scene Text Detectors
  69. 220914 PaLI #vision-language
  70. 230808 Continual Pre-Training of Large Language Models

probabilistic model

  1. 200413 Einsum Networks
  2. 200419 Roundtrip

prompt

  1. 220118 ZeroPrompt #zero-shot
  2. 220916 Text and Patterns
  3. 230207 Hard Prompts Made Easy #text2img
  4. 230517 Chain-of-Symbol Prompting Elicits Planning in Large Langauge Models #in_context_learning
  5. 230517 Tree of Thoughts #in_context_learning

pruning

  1. 200130 Rethinking Pruning
  2. 200218 Picking Winning Tickets Before Training by Preserving Gradient Flow #lottery_ticket
  3. 200224 HRank #rank
  4. 200305 Comparing Rewinding and Fine-tuning in Neural Network Pruning
  5. 200424 Convolution-Weight-Distribution Assumption
  6. 200514 Bayesian Bits #quantization #variational_inference
  7. 200515 Movement Pruning
  8. 200518 Joint Multi-Dimension Pruning
  9. 200706 Lossless CNN Channel Pruning via Decoupling Remembering and Forgetting
  10. 200710 To Filter Prune, or to Layer Prune, That Is The Question

qa

  1. 200222 Unsupervised Question Decomposition for Question Answering

quantization

  1. 220815 LLM.int8()
  2. 230216 Shared Microexponents
  3. 230425 Stable and low-precision training for large-scale vision-language models #optimizer
  4. 230601 AWQ
  5. 230719 ZeroQuant-FP

reasoning

  1. 200129 Neural Arithmetic Units
  2. 200409 Injecting Numerical Reasoning Skills into Language Models

recommender

  1. 230510 Do LLMs Understand User Preferences

regularization

  1. 200130 DropAttention #dropout
  2. 200219 Revisiting Training Strategies and Generalization Performance in Deep #metric_learning
  3. 200225 On Feature Normalization and Data Augmentation #normalization #mixup
  4. 200228 The Implicit and Explicit Regularization Effects of Dropout #dropout
  5. 200331 Regularizing Class-wise Predictions via Self-knowledge Distillation #distillation #consistency_regularization
  6. 200409 Orthogonal Over-Parameterized Training
  7. 200424 Dropout as an Implicit Gating Mechanism For Continual Learning
  8. 200427 Scheduled DropHead
  9. 200513 Implicit Regularization in Deep Learning May Not Be Explainable by Norms #training #optimization
  10. 200707 RIFLE #finetuning
  11. 200707 Remix #imbalanced
  12. 200721 Improving compute efficacy frontiers with SliceOut #efficient_training
  13. 201122 Stable Weight Decay Regularization
  14. 220527 Sharpness-Aware Training for Free
  15. 230302 Dropout Reduces Underfitting #dropout

reinforcement learning

  1. 191120 Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
  2. 200130 Mastering Atari, Go, Chess, Shogi
  3. 200626 Critic Regularized Regression
  4. 210929 Vision-Guided Quadrupedal Locomotion in the Wild with Multi-Modal Delay Randomization
  5. 211030 Mastering Atari Games with Limited Data

rendering

  1. 200130 Textured Neural Avatars

representation

  1. 200220 Neural Bayes #bayesian #clustering
  2. 200412 Gradients as Features for Deep Representation Learning
  3. 201223 Noisy Labels Can Induce Good Representations #noise

resampling

  1. 200512 Invertible Image Rescaling

restoration

  1. 200402 Learning to See Through Obstructions
  2. 200404 Deblurring by Realistic Blurring
  3. 200406 Self-Supervised Scene De-occlusion
  4. 201123 Cross-Camera Convolutional Color Constancy
  5. 201123 Dissecting Image Crops

retrieval

  1. 210715 Internet-Augmented Dialogue Generation #dialog
  2. 220124 Text and Code Embeddings by Contrastive Pre-Training

review

  1. 200130 Filter Response Normalization
  2. 200227 A Primer in BERTology #bert
  3. 200306 What is the State of Neural Network Pruning #pruning
  4. 200318 A Metric Learning Reality Check #metric_learning
  5. 200324 A Systematic Evaluation
  6. 200325 Rethinking Few-Shot Image Classification #meta_learning
  7. 200408 State of the Art on Neural Rendering #neural_rendering
  8. 200409 EvoNorm
  9. 200428 Showing Your Work Doesn't Always Work
  10. 200619 Augmentation for GANs
  11. 200627 Denoising Diffusion Probabilistic Models Implementation
  12. 200717 Semantic factor of GANs
  13. 200725 Neighbor Embedding
  14. 200821 Virtual Try On
  15. 201016 Representation Learning via Invariant Causal Mechanisms
  16. 201021 BYOL works even without batch statistics
  17. 201108 Long Range Arena #attention #efficient_attention
  18. 201112 Learning Semantic-aware Normalization for Generative Adversarial Networks
  19. 201112 When Do You Need Billions of Words of Pretraining Data

rl

  1. 230807 AlphaStar Unplugged

robustness

  1. 200211 Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial #adversarial_training
  2. 200304 A Closer Look at Accuracy vs. Robustness #adversarial_training
  3. 200810 Informative Dropout for Robust Representation Learning
  4. 220607 Can CNNs Be More Robust Than Transformers

saliency

  1. 200406 There and Back Again

salient object detection

  1. 200518 U$^2$-Net

scale

  1. 200712 Learning to Learn Parameterized Classification Networks for Scalable #hypernetwork
  2. 201130 Towards Better Accuracy-efficiency Trade-offs

score

  1. 200319 GIQA
  2. 200426 Evaluation Metrics for Conditional Image Generation

self supervised

  1. 200213 Automatically Discovering and Learning New Visual Categories with Ranking Statistics #weak_supervision
  2. 200218 MAST #tracking
  3. 200224 Self-Adaptive Training #noise #dataset
  4. 200408 Improving BERT with Self-Supervised Attention #bert #distillation
  5. 200722 CrossTransformers #few_shot
  6. 201015 Representation Learning via Invariant Causal Mechanisms #causality
  7. 201117 Neural Semi-supervised Learning for Text Classification Under #nlp
  8. 201125 Can Temporal Information Help with Contrastive Self-Supervised Learning #video #augmentation
  9. 201224 Self-supervised Pre-training with Hard Examples Improves Visual #mixup
  10. 210726 Continental-Scale Building Detection from High Resolution Satellite Imagery
  11. 210827 Injecting Text in Self-Supervised Speech Pretraining #asr
  12. 210927 Compressive Visual Representations
  13. 211027 Neural Analysis and Synthesis #audio_synthesis
  14. 220124 data2vec
  15. 220216 Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision
  16. 220520 Uniform Masking
  17. 220526 Green Hierarchical Vision Transformer for Masked Image Modeling
  18. 220526 MixMIM
  19. 220526 Revealing the Dark Secrets of Masked Image Modeling #representation
  20. 220715 Is a Caption Worth a Thousand Images #clip
  21. 220803 Masked Vision and Language Modeling for Multi-modal Representation Learning #mlm

self supervised discovery

  1. 200403 Self-Supervised Viewpoint Learning From Image Collections #viewpoint
  2. 201127 Unsupervised part representation by Flow Capsules
  3. 210429 MarioNette

semantic factor

  1. 200307 StyleGAN2 Distillation for Feed-forward Image Manipulation #stylegan
  2. 200308 PULSE #stylegan
  3. 200406 GANSpace
  4. 201222 Time-Travel Rephotography #restoration #stylegan

semantic segmentation

  1. 200323 Learning Dynamic Routing for Semantic Segmentation
  2. 200516 Single-Stage Semantic Segmentation from Image Labels
  3. 200826 EfficientFCN
  4. 210512 Segmenter
  5. 220918 SegNeXt

semi supervised learning

  1. 200306 Semi-Supervised StyleGAN for Disentanglement Learning #stylegan #mixup
  2. 200323 Meta Pseudo Labels #meta_learning
  3. 200627 Laplacian Regularized Few-Shot Learning #few_shot
  4. 200724 Deep Co-Training with Task Decomposition for Semi-Supervised Domain #domain_adaptation
  5. 201116 On the Marginal Benefit of Active Learning #active_learning #unsupervised_training
  6. 201118 FROST
  7. 220811 Semi-supervised Vision Transformers at Scale
  8. 220829 Open-Set Semi-Supervised Object Detection #open_set_recognition
  9. 220918 The Geometry of Self-supervised Learning Models and its Impact on Transfer Learning

seq2seq

  1. 230502 Unlimiformer

sgld

  1. 200706 Kernel Stein Generative Modeling #svgd

singing voice synthesis

  1. 211008 KaraSinger

single image

  1. 200405 Structural-analogy from a Single Image Pair

speech

  1. 200129 Speech Recognition
  2. 200129 WaveFlow #conditional_generative_model
  3. 230511 CoMoSpeech #audio_synthesis

state space model

  1. 211031 Efficiently Modeling Long Sequences with Structured State Spaces
  2. 221017 What Makes Convolutional Models Great on Long Sequence Modeling
  3. 230213 Simple Hardware-Efficient Long Convolutions for Sequence Modeling

structure learning

  1. 200518 Large-scale empirical validation of Bayesian Network structure learning

style transfer

  1. 200318 A Content Transformation Block For Image Style Transfer
  2. 200324 Deformable Style Transfer
  3. 200710 Geometric Style Transfer

stylegan

  1. 210318 Labels4Free #unsupervised_segmentation

super resolution

  1. 200129 ESRGAN+
  2. 200323 Deep Unfolding Network for Image Super-Resolution

table

  1. 210906 Parsing Table Structures in the Wild
  2. 220809 TSRFormer

text generation

  1. 200130 Unlikelihood Training
  2. 200605 CoCon

text2img

  1. 221125 3DDesigner #3d_generative_model
  2. 221125 SpaText
  3. 230502 Pick-a-Pic

tokenizer

  1. 211006 How BPE Affects Memorization in Transformers
  2. 230421 Evaluating Transformer Language Models on Arithmetic Operations Using Number Decomposition

topic model

  1. 200426 Neural Topic Modeling with Bidirectional Adversarial Training

topology

  1. 200413 Topology of deep neural networks #theory

tracking

  1. 200402 Tracking Objects as Points #keypoint
  2. 200402 Tracking by Instance Detection #meta_learning
  3. 200403 FairMOT
  4. 200506 PeTra
  5. 201215 Detecting Invisible People
  6. 211013 ByteTrack

training

  1. 200702 Beyond Signal Propagation

transducer

  1. 200519 A New Training Pipeline for an Improved Neural Transducer

transfer

  1. 200130 BiT ResNet #resnet
  2. 200512 Neural Architecture Transfer #nas
  3. 200711 Adversarially-Trained Deep Nets Transfer Better #adversarial_training
  4. 200716 Do Adversarially Robust ImageNet Models Transfer Better #robust
  5. 200721 Adversarial Training Reduces Information and Improves Transferability #adversarial_training
  6. 201122 Ranking Neural Checkpoints
  7. 211012 Rethinking supervised pre-training for better downstream transferring #classificiation #metric_learning

transformer

  1. 200129 Are Transformers universal approximator
  2. 200129 Product Key Memory #attention
  3. 200129 Reformer #attention
  4. 200130 RoBERTa #pretraining #language_model #nlp
  5. 200130 Sparse Transformer #generative_model
  6. 200130 Structured Pruning for LM #pruning
  7. 200130 T5 #pretraining #nlp #seq2seq
  8. 200207 Transformer Transducer #asr #transducer
  9. 200211 On Layer Normalization in the Transformer Architecture #normalization
  10. 200212 GLU Variants Improve Transformer #activation
  11. 200214 Transformer on a Diet #efficient_attention
  12. 200214 Transformers as Soft Reasoners over Language #language
  13. 200215 Fine-Tuning Pretrained Language Models #bert #finetuning
  14. 200221 Addressing Some Limitations of Transformers with Feedback Memory #recurrent
  15. 200305 Talking-Heads Attention #attention
  16. 200424 Lite Transformer with Long-Short Range Attention #lightweight
  17. 200515 Finding Experts in Transformer Models
  18. 200515 JDI-T #tts
  19. 200516 Conformer #asr
  20. 200518 Weak-Attention Suppression For Transformer Based Speech Recognition #asr
  21. 200605 Funnel-Transformer #efficient_attention
  22. 200707 Do Transformers Need Deep Long-Range Memory #lm #attention
  23. 200709 Fast Transformers with Clustered Attention #attention
  24. 200715 AdapterHub #nlp #finetuning
  25. 200727 Big Bird #attention
  26. 200802 DeLighT #nlp
  27. 201217 Taming Transformers for High-Resolution Image Synthesis #discrete_vae #generative_model #autoregressive_model
  28. 201221 RealFormer #attention
  29. 201227 SG-Net #syntax #attention
  30. 210223 Do Transformer Modifications Transfer Across Implementations and
  31. 210225 Evolving Attention with Residual Convolutions #attention
  32. 210318 HiT #video #retrieval
  33. 210318 Looking Beyond Two Frames #tracking
  34. 210318 TFPose #pose
  35. 210318 TransCenter #tracking
  36. 210318 Transformer Trackin #tracking
  37. 210407 Seeing Out of tHe bOx #multimodal #vision-language
  38. 210409 Efficient Large-Scale Language Model Training on GPU Clusters #distributed_training
  39. 210409 Not All Attention Is All You Need
  40. 210410 UniDrop #regularization
  41. 210417 Demystifying the Better Performance of Position Encoding Variants for #positional_encoding
  42. 210420 RoFormer #positional_encoding
  43. 210423 M3DeTR #3d
  44. 210509 FNet #efficient_attention #fourier
  45. 210510 Are Pre-trained Convolutions Better than Pre-trained Transformers #pretraining #nlp #convolution
  46. 210613 Thinking Like Transformers
  47. 210617 Multi-head or Single-head
  48. 210730 Perceiver IO
  49. 210809 Making Transformers Solve Compositional Tasks
  50. 210812 Mobile-Former #backbone
  51. 210830 A Battle of Network Structures #cnn #mlp #backbone
  52. 210830 Shatter #bert
  53. 210908 Panoptic SegFormer #panoptic_segmentation #detr
  54. 210909 Bag of Tricks for Optimizing Transformer Efficiency #nmt #lightweight
  55. 210917 Primer #lm #nas
  56. 210922 Scale Efficiently
  57. 211018 NormFormer
  58. 211026 Hierarchical Transformers Are More Efficient Language Models #lm #efficient_attention
  59. 211122 MetaFormer is Actually What You Need for Vision #vit
  60. 211124 Sparse is Enough in Scaling Transformers #sparsity #efficiency
  61. 220221 Transformer Quality in Linear Time #efficient_attention #linear_attention #local_attention
  62. 220301 DeepNet #normalization
  63. 220330 Transformer Language Models without Positional Encodings Still Learn Positional Information #lm #positional_encoding
  64. 220924 In-context Learning and Induction Heads #in_context_learning
  65. 221004 MOAT #backbone
  66. 221220 A Length-Extrapolatable Transformer #positional_encoding
  67. 230209 In-Context Learning with Many Demonstration Examples #efficient_attention
  68. 230311 Stabilizing Transformer Training by Preventing Attention Entropy Collapse #stability
  69. 230419 Scaling Transformer to 1M tokens and beyond with RMT
  70. 230428 ResiDual #normalization
  71. 230504 BranchNorm #normalization
  72. 230504 On the Expressivity Role of LayerNorm in Transformers' Attention #attention #normalization
  73. 230507 Vcc #efficient_attention
  74. 230512 MEGABYTE #tokenizer
  75. 230512 TinyStories #lm
  76. 230522 GQA
  77. 230530 Grokking of Hierarchical Structure in Vanilla Transformers
  78. 230612 Augmenting Language Models with Long-Term Memory
  79. 230622 Quantizable Transformers
  80. 230627 Length Generalization in Arithmetic Transformers
  81. 230706 Lost in the Middle #lm
  82. 230707 Teaching Arithmetic to Small Transformers
  83. 230720 L-Eval #benchmark
  84. 230727 Scaling TransNormer to 175 Billion Parameters #efficient_attention

tropical geometry

  1. 200220 On the Decision Boundaries of Neural Networks

tts

  1. 200512 Flowtron #flow
  2. 210617 WaveGrad 2

uncertainty

  1. 210727 A Tale Of Two Long Tails

unsupervised img2img

  1. 200310 Unpaired Image-to-Image Translation using Adversarial Consistency Loss
  2. 200611 Rethinking the Truly Unsupervised Image-to-Image Translation
  3. 201201 Unpaired Image-to-Image Translation via Latent Energy Transport

unsupervised nmt

  1. 200422 When and Why is Unsupervised Neural Machine Translation Useless

vae

  1. 200420 Bringing Old Photos Back to Life #restoration
  2. 200707 NVAE
  3. 201119 Dual Contradistinctive Generative Autoencoder
  4. 201120 Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them

video

  1. 210325 An Image is Worth 16x16 Words, What is a Video Worth

video transformer

  1. 210423 VidTr

vision

  1. 200305 Optimizing JPEG Quantization for Classification Networks
  2. 201127 Field of Junctions

vision language

  1. 201212 MiniVLM
  2. 201222 Seeing past words
  3. 210407 Multimodal Fusion Refiner Networks
  4. 210727 Is Object Detection Necessary for Human-Object Interaction Recognition #human-object-interaction
  5. 220221 Vision-Language Pre-Training with Triple Contrastive Learning
  6. 220504 CoCa
  7. 220612 GLIPv2
  8. 220615 Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
  9. 220617 Bridge-Tower
  10. 220617 Unified-IO #multitask
  11. 220810 Patching open-vocabulary models by interpolating weights #clip #multitask #domain
  12. 220822 Image as a Foreign Language #mlm
  13. 230209 Re-ViLM
  14. 230313 Scaling Vision-Language Models with Sparse Mixture of Experts #mixture_of_experts

vision transformer

  1. 201127 General Multi-label Image Classification with Transformers
  2. 201223 A Survey on Visual Transformer
  3. 201223 Training data-efficient image transformers & distillation through #distillation
  4. 210223 Pyramid Vision Transformer
  5. 210318 CrossViT
  6. 210318 CvT
  7. 210318 Multi-Scale Vision Longformer
  8. 210319 ConViT
  9. 210319 Scalable Visual Transformers with Hierarchical Pooling
  10. 210324 Vision Transformers for Dense Prediction #fpn
  11. 210325 Swin Transformer #local_attention
  12. 210331 Going deeper with Image Transformers
  13. 210402 LeViT
  14. 210421 Token Labeling
  15. 210422 Multiscale Vision Transformers
  16. 210422 So-ViT
  17. 210426 Improve Vision Transformers Training by Suppressing Over-smoothing
  18. 210426 Visformer
  19. 210427 ConTNet
  20. 210428 Twins #local_attention #positional_encoding
  21. 210509 Conformer
  22. 210515 Are Convolutional Neural Networks or Transformers more like human vision #cnn #inductive_bias
  23. 210517 Rethinking the Design Principles of Robust Vision Transformer #robustness

visual grounding

  1. 210401 Towards General Purpose Vision Systems
  2. 210510 Visual Grounding with Transformers

vit

  1. 210521 Intriguing Properties of Vision Transformers #robustness
  2. 210526 Aggregating Nested Transformers #local_attention
  3. 210529 Less is More
  4. 210603 DynamicViT #sparse_attention
  5. 210603 When Vision Transformers Outperform ResNets without Pretraining or Strong Data Augmentations #regularization
  6. 210604 RegionViT #local_attention
  7. 210607 Refiner #attention
  8. 210607 Shuffle Transformer
  9. 210608 Scaling Vision Transformers #scale
  10. 210609 CoAtNet
  11. 210614 Delving Deep into the Generalization of Vision Transformers under Distribution Shifts #robustness
  12. 210615 Revisiting the Calibration of Modern Neural Networks #mlp #calibration
  13. 210617 XCiT #efficient_attention
  14. 210624 Exploring Corruption Robustness #robustness #mlp
  15. 210624 VOLO #efficient_attention
  16. 210624 Video Swin Transformer #local_attention #video #video_transformer
  17. 210701 CSWin Transformer #efficient_attention #local_attention
  18. 210701 Focal Self-attention for Local-Global Interactions in Vision Transformers #local_attention
  19. 210705 What Makes for Hierarchical Vision Transformer #attention #mlp #local_attention
  20. 210713 Visual Parser #local_attention
  21. 210731 CrossFormer
  22. 210811 ConvNets vs. Transformers #robustness #transfer
  23. 210819 Do Vision Transformers See Like Convolutional Neural Networks #resnet
  24. 210908 Scaled ReLU Matters for Training Vision Transformers #cnn
  25. 211118 Swin Transformer V2
  26. 211202 Improved Multiscale Vision Transformers for Classification and Detection
  27. 211210 Deep ViT Features as Dense Visual Descriptors #self_supervised #semantic_segmentation
  28. 211217 A Simple Single-Scale Vision Transformer for Object Localization and Instance Segmentation #multiscale
  29. 220214 How Do Vision Transformers Work #cnn
  30. 220414 DeiT III
  31. 220722 An Impartial Take to the CNN vs Transformer Robustness Contest #robustness #cnn
  32. 220812 BEiT v2 #self_supervised #mlm
  33. 221110 Demystify Transformers & Convolutions in Modern Image Deep Networks #cnn
  34. 230202 Dual PatchNorm #normalization
  35. 230712 Patch n' Pack

vocoder

  1. 200512 FeatherWave
  2. 201118 Universal MelGAN

vq

  1. 230311 Regularized Vector Quantization for Tokenized Image Synthesis

vqa

  1. 220914 MUST-VQA

weak supervision

  1. 201126 SelfText Beyond Polygon #ocr

yolo

  1. 230113 YOLOv6 v3.0

uncategorized

  1. 09
  2. 200211 fastai
  3. 210224 Zero-Shot Text-to-Image Generation
  4. 210603 The Case for Translation-Invariant Self-Attention in Transformer-Based Language Models
  5. 210606 Referring Transformer
  6. 210607 ViTAE
  7. 210614 Non Gaussian Denoising Diffusion Models
  8. 210909 PIMNet
  9. 211026 Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers
  10. 211028 Colossal-AI
  11. 211215 Value Retrieval with Arbitrary Queries for Form-like Documents
  12. 221125 Solving math word problems with process- and outcome-based feedback
  13. 221204 Languages You Know Influence Those You Learn
  14. 221215 Constitutional AI
  15. 220114 DeepSpeed-MoE
  16. 220203 AlphaCode, Formal Math
  17. 220204 InstructGPT
  18. 220316 Memorizing Transformers
  19. 220323 Pathways
  20. 220329 Few Could Be Better Than All
  21. 220405 Text Spotting Transformers
  22. 220416 Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks
  23. 220510 UL2
  24. 220610 A Multi-Task Benchmark for Korean Legal Language Understanding and Judgement Prediction
  25. 220612 Self-critiquing models for assisting human evaluators
  26. 220614 RDU
  27. 220630 DeepSpeed Inference
  28. 220712 Inner Monologue
  29. 220720 NUWA-Infinity
  30. 220722 Multiface
  31. 220725 CelebV-HQ
  32. 220725 Neural Generation Meets Real People
  33. 220725 Towards Complex Document Understanding By Discrete Reasoning
  34. 220819 FP8 Quantization
  35. 220823 CLOWER
  36. 220912 FP8 Formats for Deep Learning
  37. 220923 Diffusion
  38. 220928 Improving alignment of dialogue agents via targeted human judgements
  39. 220928 The Change You Want to See
  40. 221219 MatCha
  41. 230203 Measuring The Impact Of Programming Language Distribution
  42. 230206 SmoothQuant
  43. 230207 Efficiently Upgrading Multilingual Machine Translation Models to Support More Languages
  44. 230207 FP8
  45. 230208 Google Configuration System
  46. 230209 Efficient Attention via Control Variates
  47. 230211 Generative AI에 대한 생각
  48. 230213 Lossy Compression
  49. 230214 Adding Instructions during Pretraining
  50. 230214 Score-based Diffusion Models in Function Space
  51. 230216 Aligning Language Models with Preferences through f-divergence Minimization
  52. 230220 DSP
  53. 230221 Anthropic
  54. 230222 AlpaServe
  55. 230222 FlexGen
  56. 230223 Colossal AI ChatGPT
  57. 230223 On the Generalization Ability of Retrieval-Enhanced Transformers
  58. 230224 World Models
  59. 230228 SHP
  60. 230306
  61. 230311 Resurrecting Recurrent Neural Networks for Long Sequences
  62. 230312 ChatGPT Asks, BLIP-2 Answers
  63. 230314 ViperGPT
  64. 230315 GPT-4
  65. 230320 Reflexion
  66. 230323 The Quantization Model of Neural Scaling
  67. 230327 EVA-CLIP
  68. 230327 unarXive 2022
  69. 230328 Improving Code Generation by Training with Natural Language Feedback
  70. 230331 Autoregressive Model
  71. 230331 Choose Your Weapon
  72. 230406 Quantization
  73. 230407 RLHF
  74. 230414 OpenAssistant Conversations -- Democratizing Large Language Model Alignment
  75. 230416 Open Assistant
  76. 230417 Tool Learning with Foundation Models
  77. 230418 HCI
  78. 230420 Stable LM
  79. 230428 Are Emergent Abilities of Large Language Models a Mirage
  80. 230502 RedPajama
  81. 230504 ZipIt! Merging Models from Different Tasks without Training
  82. 230511 An Inverse Scaling Law for CLIP Training
  83. 230511 InstructBLIP
  84. 230511 Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers
  85. 230511 Simple Token-Level Confidence Improves Caption Correctness
  86. 230516 SpecInfer
  87. 230518 Evidence of Meaning in Language Models Trained on Programs
  88. 230518 구글 달리기
  89. 230522 Training Diffusion Models with Reinforcement Learning
  90. 230523 ZeroSCROLLS
  91. 230524 Model evaluation for extreme risks
  92. 230525 The False Promise of Imitating Proprietary LLMs
  93. 230527 Fine-Tuning Language Models with Just Forward Passes
  94. 230531 Let's Verify Step by Step
  95. 230601 Hiera
  96. 230601 SnapFusion
  97. 230602 Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
  98. 230608 SequenceMatch
  99. 230611 LAMM
  100. 230616 Scaling Open-Vocabulary Object Detection
  101. 230616 ZeRO++
  102. 230619 RepoFusion
  103. 230621 Constant Memory Attention Block
  104. 230621 Limits for Learning with Language Models
  105. 230626 Pretraining task diversity and the emergence of non-Bayesian in-context learning for regression
  106. 230626 Understanding In-Context Learning via Supportive Pretraining Data
  107. 230627 IDOL
  108. 230629 An Efficient General-Purpose Modular Vision Model via Multi-Task Heterogeneous Training
  109. 230629 Generate Anything Anywhere in Any Scene
  110. 230629 Generative AI for Programming Education
  111. 230629 LLaVAR
  112. 230701 Let Me Teach You
  113. 230701 NTK Aware Scaled RoPE
  114. 230706 Superalignment
  115. 230706 Training Models to Generate, Recognize, and Reframe Unhelpful Thoughts
  116. 230708 DDPO
  117. 230710 About Anthropic
  118. 230710 BeaverTails
  119. 230710 FreeDrag
  120. 230710 Large Language Models as General Pattern Machines
  121. 230710 VampNet
  122. 230711 GPT-4 FLOPS
  123. 230711 Objaverse-XL
  124. 230711 Self-consistency for open-ended generations
  125. 230712 Claude 2
  126. 230712 Instruction Mining
  127. 230713 Hassabis
  128. 230714 Code Interpreter
  129. 230718 Flash Attention 2
  130. 230718 How is ChatGPT's behavior changing over time
  131. 230719 Llama 2
  132. 230724 RLCD
  133. 230725 Retentive Network
  134. 230728 Exploring Format Consistency for Instruction Tuning
  135. 230728 The Hydra Effect
  136. 230729 Configuration System
  137. 230803 H100 Supply and Demand
  138. 230803 Multimodal Neurons in Pretrained Text-Only Transformers
  139. 230804 Retroformer
  140. 230807 Intelligent Assistant Language Understanding On Device
  141. 230808 Gentopia
  142. 230809 StableCode
  143. 230810 ReRoPE
  144. 210714 Deduplicating Training Data Makes Language Models Better
  145. 211122 ExT5
  146. 230523 Aligning Large Language Models through Synthetic Feedback