/transformers-papers

Papers club from the AI team in D-ID - this time transformers from attention to vision [Hebrew | עיברית]

transformers-papers

Papers club from the AI team in D-ID - this time transformers from attention to vision lectures are in Hebrew

מועדון קריאת מאמרים שלנו - כל ההרצאות בעיברית

Lecture Paper / Resource Year Why is it interesting? Asignee Recording Presentation
Transformers are worth your attention Attention is all you need 2017
read whyThe paper that started it all, introduction to the basic concept & comparison to previous methods like RNN. The transformer here has both encoder & decoder layers creating a seq2seq model
@matan-feldman zoom (K%32MLKi) slides
Transformers tricks - Positional Encoding, Layer Norm, Residual Connections. In code! The annotated transformer 2017
read whyGoing into depth into the various tricks used to make transformers work. Implementing a trnasformer without them would lead to poor results
self-work x x
Visualizing Attention Visualizing Attention in Transformer-Based Language Representation Models On the Relationship between Self-Attention and Convolutional Layers 2017
read whyAttention is useful for explainability too, we can see what the network is using for the task. In this lecture we will exammine visualizations of this in NLP & vision
self-work x x
BERT BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 2018
read whyThis model from Google used only encoders & achived state of the art on many NLP tasks
@leong-deid zoom (LW$8fQ6f) slides
GPT Language Models are Few-Shot Learners 2020
read whyThis model from OpenAI used only decoders & achived state of the art text generation. Its authors first didn't release it becuase they said it is too dangerous. It is now the backbone of Github Co-Pilot
self-work x x
Wav2Vec U Wav-to-vec U - Unsupervised Speech Recognition 2021
read whythis unsupervised model from Facebook, is able to learn language representations. we use the supervised version in our A2K input
@matan-feldman zoom (p.qE+Q59) slides
DETR for object detection & segmentations End-to-End Object Detection with Transformers 2020
read whyTaking transformers even further to other CV tasks in this paper the autors from FacebookAI combine CNN with transformers to reduce some of the human prio needed in designing object detection & segmentation models
@talbenh zoom (17K%NSf3) slides
ViT An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale 2020
read whyTreats the image as a sentence of 16x16 patches words. The model acheives SoTA in classiciation tasks with significant less compute. with supervision from large scale images datasets
@alon-mengi zoom (ve1VHEM=) slides
CLIP Learning Transferable Visual Models From Natural Language Supervision 2021
read whyOpenAI model that learns two encoders from images &text & via contrastive learning achive SoTA result on image classification while increasing dramatically the robustness over previous methods. Using internet scraped data instead of expesive annotated datasets.
@amitay-nachmani zoom (^a1!1BJf) slides
Preceiver Perceiver IO: A General Architecture for Structured Inputs & Outputs 2021
read whyPreceivers models use cross-attention & learned latent dictionaries to work on many modalities by reducing the self attention complexity. The authors demostrate that the model produces baseline results on many tasks
@orgoro zoom (Ba9DQ&Ef) slides
Dall-E2 & Imagen Hierarchical Text-Conditional Image Generation with CLIP Latents 2022
read whyDALL·E 2 is a new AI system that can create realistic images and art from a description in natural language from OpenAI. The model uses CLIP embedding & diffusion models to generate images from a text description. Google DeepMind also came up with a competing model called Imagen that argues for superior quality
@talbenh zoom () slides