/Deep-Learning-Paper

These are papers that I read and reviewed related to NLP, CV, and Deep Learning ๐Ÿ˜‰ You can check paper links and my reviews ๐Ÿ˜Š

Primary LanguageJupyter Notebook

NLP, CV, Deep Learning Paper Reading Table

I read these papers that are related to NLP and Deep Learning. Here are various papers from basic to advanced. ๐Ÿ˜Š In addition, you can check my Korean paper reviews by clicking the link attached to the table. ๐Ÿ˜‰

You can see more paper reviews, code implementation, and mathematics descriptions in my blog <- click here

My Insight ๐Ÿง

I write several articles to explain in detail some Deep Learning technologies. These articles can be found in the table below.

Title Blog link
How has scaling law developed in NLP? ๐Ÿค” https://cartinoe5930.tistory.com/entry/How-has-scaling-law-developed-in-NLP-%F0%9F%A4%94-NLP%EC%97%90%EC%84%9C-scaling-law%EB%8A%94-%EC%96%B4%EB%96%BB%EA%B2%8C-%EB%B0%9C%EC%A0%84%EB%90%98%EC%97%88%EC%9D%84%EA%B9%8C
Closed-source๐Ÿ”’? Open-source๐Ÿ”“? What is that?? ๐Ÿคจ๐Ÿค” https://cartinoe5930.tistory.com/entry/The-hopes-of-researchers-Open-source-%F0%9F%A4%97-%EC%97%B0%EA%B5%AC%EC%9E%90%EB%93%A4%EC%9D%98-%ED%9D%AC%EB%A7%9D-Open-source-%F0%9F%A4%97
Context window of LM, should it be long? Should it be short? ๐Ÿ“๐Ÿคจ https://cartinoe5930.tistory.com/entry/LM%EC%9D%98-context-window-%EA%B8%B8%EC%96%B4%EC%95%BC-%ED%95%A0%EA%B9%8C-%EC%A7%A7%EC%95%84%EC%95%BC-%ED%95%A0%EA%B9%8C-%F0%9F%93%8F%F0%9F%A4%A8
What is the most optimal way to evaluate LM? ๐Ÿ˜Ž https://cartinoe5930.tistory.com/entry/LM%EC%9D%84-%EA%B0%80%EC%9E%A5-%EC%B5%9C%EC%A0%81%EC%9C%BC%EB%A1%9C-%ED%8F%89%EA%B0%80%ED%95%A0-%EC%88%98-%EC%9E%88%EB%8A%94-%EB%B0%A9%EB%B2%95%EC%9D%80-%EB%AC%B4%EC%97%87%EC%9D%BC%EA%B9%8C-%F0%9F%98%8E
The performance of ChatGPT is getting worse?!?!? ๐Ÿ˜ฒ๐Ÿ˜ฒ https://cartinoe5930.tistory.com/entry/ChatGPT%EC%9D%98-%EC%84%B1%EB%8A%A5%EC%9D%B4-%EC%95%88-%EC%A2%8B%EC%95%84%EC%A7%80%EA%B3%A0-%EC%9E%88%EB%8B%A4%EA%B5%AC-%F0%9F%98%B2%F0%9F%98%B2
You can fine-tune too! with PEFT ๐Ÿค— https://cartinoe5930.tistory.com/entry/%EB%8B%B9%EC%8B%A0%EB%8F%84-Fine-tuning-%ED%95%A0-%EC%88%98-%EC%9E%88%EC%8A%B5%EB%8B%88%EB%8B%A4-with-PEFT-%F0%9F%A4%97
Let's think step by step like humans! ๐Ÿง ๐Ÿค” https://cartinoe5930.tistory.com/entry/%ED%95%9C-%EB%8B%A8%EA%B3%84-%ED%95%9C-%EB%8B%A8%EA%B3%84%EC%94%A9-%EC%9D%B8%EA%B0%84%EC%B2%98%EB%9F%BC-%EC%83%9D%EA%B0%81%ED%95%B4%EB%B3%B4%EC%9E%90-%F0%9F%A7%A0%F0%9F%A4%94
Development process of fine-tuning method!! From fine-tuning to RLHF ๐Ÿฆ–โžก๏ธ๐Ÿง‘ https://cartinoe5930.tistory.com/entry/Fine-tuning-method%EC%9D%98-%EC%A7%84%ED%99%94-%EA%B3%BC%EC%A0%95-%F0%9F%A6%96%E2%9E%A1%EF%B8%8F%F0%9F%A7%91
It's time to fine-tune ChatGPT!! โฐ https://cartinoe5930.tistory.com/entry/%EC%9D%B4%EC%A0%9C%EB%8A%94-ChatGPT%EB%A5%BC-fine-tuning-%ED%95%A0-%EC%8B%9C%EA%B0%84-%E2%8F%B0
Noise makes LLM better! - NEFTune ๐Ÿ˜‰ https://cartinoe5930.tistory.com/entry/Noise-makes-LLM-better-NEFTune-%F0%9F%98%89

Natural Language Processing

Word Embedding & Neural Networks

Paper Title Paper or reference site Link Paper Review
Embedding Matrix https://wikidocs.net/book/2155 https://cartinoe5930.tistory.com/entry/Embedding-Matrix-%ED%95%99%EC%8A%B5
LSTM: Long-Short Term Memory https://colah.github.io/posts/2015-08-Understanding-LSTMs/ https://cartinoe5930.tistory.com/entry/%EC%95%8C%EA%B8%B0-%EC%89%BD%EA%B2%8C-LSTM-networks-%EC%9D%B4%ED%95%B4%ED%95%98%EA%B8%B0
GRU: Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation https://arxiv.org/abs/1406.1078 https://cartinoe5930.tistory.com/entry/GRU-Empirical-Evaluation-of-Gated-Recurrent-Neural-Networks-on-Sequence-Modeling-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
LSTM vs. GRU: Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling https://arxiv.org/abs/1412.3555 https://cartinoe5930.tistory.com/entry/LSTM-vs-GRU-%EB%AD%90%EA%B0%80-%EB%8D%94-%EB%82%98%EC%9D%84%EA%B9%8C-Empirical-Evaluation-of-Gated-Recurrent-Neural-Networks-on-Sequence-Modeling-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0

Language Models๐Ÿค–

Basic๐Ÿ“–

Paper Title Paper or reference site Link Paper Review
Transformer: Attention Is All You Need https://arxiv.org/abs/1706.03762 https://cartinoe5930.tistory.com/entry/Transformer-Attention-Is-All-You-Need-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
ELMo: Deep contextualized word representations https://arxiv.org/abs/1802.05365 https://cartinoe5930.tistory.com/entry/Pre-trained-Language-Modeling-paper-reading1-ELMo-Deep-contextualized-word-representations
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding https://arxiv.org/abs/1810.04805 https://cartinoe5930.tistory.com/entry/Pre-trained-Language-Modeling-paper-reading2-BERT-Pre-training-of-Deep-Bidirectional-Transformers-for-Language-Understanding
GPT-1: Improving Language Understanding by Generative Pre-Training https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf https://cartinoe5930.tistory.com/entry/Pre-trained-Language-Modeling-paper-reading3-GPT-1-Improving-Language-Understanding-by-Generative-Pre-Training
GPT-2: Language Models are Unsupervised Multitask Learners https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf https://cartinoe5930.tistory.com/entry/GPT-2-Language-Models-are-Unsupervised-Multitask-Learners-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
GPT-3: Language Models are Few-Shot Learners https://cartinoe5930.tistory.com/entry/GPT-3-Language-Models-are-Few-Shot-Learners-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 https://cartinoe5930.tistory.com/entry/GPT-3-Language-Models-are-Few-Shot-Learners-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context https://arxiv.org/abs/1901.02860 https://cartinoe5930.tistory.com/entry/Transformer-XL-Attentive-Language-Models-Beyond-a-Fixed-Length-Context-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
Sparse Transformers: Generating Long Sequences with Sparse Transformers https://arxiv.org/abs/1904.10509 https://cartinoe5930.tistory.com/entry/Sparse-Transformers-Generating-Long-Sequence-with-Sparse-Transformers-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
XLNET: Generalized Autoregressive Pretraining for Language Understanding https://arxiv.org/abs/1906.08237 https://cartinoe5930.tistory.com/entry/XLNet-Generalized-Autoregressive-Pretraining-for-Language-Understanding-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
SpanBERT: Improving Pre-training by Representing and Predicting Spans https://arxiv.org/abs/1907.10529 https://cartinoe5930.tistory.com/entry/SpanBERT-Improving-Pre-training-by-Representing-and-Predicting-Spans-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
RoBERTa: A Robustly Optimized BERT Pre-training Approach https://arxiv.org/abs/1907.11692 https://cartinoe5930.tistory.com/entry/RoBERTa-A-Robustly-Optimized-BERT-Pretraining-Approach-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks https://arxiv.org/abs/1908.10084 https://cartinoe5930.tistory.com/entry/Sentence-BERT-Sentence-Embeddings-using-Siamese-BERT-Networks-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations https://arxiv.org/abs/1909.11942 https://cartinoe5930.tistory.com/entry/ALBERT-A-Lite-BERT-for-Self-supervised-Learning-of-Language-Representations-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension https://arxiv.org/abs/1910.13461 https://cartinoe5930.tistory.com/entry/BART-Denoising-Sequence-to-Sequence-Pre-training-for-Natural-Language-Generation-Translation-and-Comprehension-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
Pre-LN Transformer: On Layer Normalization in the Transformer Architecture https://arxiv.org/abs/2002.04745 https://cartinoe5930.tistory.com/entry/Pre-LN-Transformer-On-Layer-Normalization-in-the-Transformer-Architecture-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
ELECTRA: Pre-training Text Encoders as Discriminators rather than Generators https://arxiv.org/abs/2003.10555 https://cartinoe5930.tistory.com/entry/ELECTRA-Pre-training-Text-Encoders-as-Discriminators-rather-than-Generators
Longformer: The Long-Document Transformer https://arxiv.org/abs/2004.05150 https://cartinoe5930.tistory.com/entry/Longformer-The-Long-Document-Transformer-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
BigBird: Transformers for Longer Sequences https://arxiv.org/abs/2007.14062 https://cartinoe5930.tistory.com/entry/BigBird-Transformers-for-Longer-Sequences-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
WebGPT: Browser-assisted question-answering with human feedback https://arxiv.org/abs/2112.09332 https://cartinoe5930.tistory.com/entry/WebGPT-Browser-assisted-question-answering-with-human-feedback-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
OPT: Open Pre-trained Transformer Language Models https://arxiv.org/abs/2205.01068 https://cartinoe5930.tistory.com/entry/OPT-Open-Pre-trained-Transformer-Language-Models-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
Mamba: Linear-Time Sequence Modeling with Selective State Spaces https://arxiv.org/abs/2312.00752 No plan!

Efficient Models๐Ÿ’ธ

Paper Title Paper or reference site Link Paper Review
TinyBERT: Distilling BERT for Natural Language Understanding https://arxiv.org/abs/1909.10351 https://cartinoe5930.tistory.com/entry/TinyBERT-Distilling-BERT-for-Natural-Language-Understanding-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
DistilBERT: a distilled version of BERT https://arxiv.org/abs/1910.01108 https://cartinoe5930.tistory.com/entry/DistilBERT-a-distilled-version-of-BERT-smaller-faster-cheaper-and-lighter-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
It's Not Just Size That Matters: Small Language Models are Also Few-Shot Learners(PET ์‘์šฉ) https://arxiv.org/abs/2009.07118 https://cartinoe5930.tistory.com/entry/Its-Not-Just-Size-That-Matters-Small-Language-Models-Are-Also-Few-Shot-Learners-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0

Open-source Language Model(Scaling law)๐Ÿค—

Paper Title Paper or reference site Link Paper Review
Chinchilla: Training Compute-Optimal Large Language Models https://arxiv.org/abs/2203.15556 https://cartinoe5930.tistory.com/entry/%EC%A7%80%EA%B8%88-%EA%B9%8C%EC%A7%80%EC%9D%98-LM-Scaling-Law%EC%97%90%EB%8A%94-%EB%AC%B8%EC%A0%9C%EC%A0%90%EC%9D%B4-%EC%9E%88%EB%8B%A4-%F0%9F%98%B6%E2%80%8D%F0%9F%8C%AB%EF%B8%8F-Chinchilla-Training-Compute-Optimal-Large-Language-Models-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling https://arxiv.org/abs/2304.01373 No plan!
LIMA: Less Is More for Alignment https://arxiv.org/abs/2305.11206 https://cartinoe5930.tistory.com/entry/LIMA-Less-Is-More-for-Alignment-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
LLaMA: Open and Efficient Foundation Language Models https://arxiv.org/abs/2302.13971 https://cartinoe5930.tistory.com/entry/LLaMA-Open-and-Efficient-Foundation-Language-Models-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
WizardLM: Empowering Large Language Models to Follow Complex Instructions https://arxiv.org/abs/2304.12244 https://cartinoe5930.tistory.com/entry/Open-domain-instruction%EC%9D%98-%ED%9A%A8%EA%B3%BC-%F0%9F%AA%84-WizardLM-Empowering-Large-Language-Models-to-Follow-Complex-Instructions-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
WizardCoder: Empowering Code Large Language Models with Evol-Instruct https://arxiv.org/abs/2306.08568 https://huggingface.co/WizardLM/WizardCoder-15B-V1.0
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct https://arxiv.org/abs/2308.09583 https://huggingface.co/WizardLM/WizardMath-70B-V1.0
Alpaca: A Strong, Replicable Instruction-Following Model https://crfm.stanford.edu/2023/03/13/alpaca.html https://cartinoe5930.tistory.com/entry/Alpaca-A-Strong-Replicable-Instruction-Following-Model-%EB%A6%AC%EB%B7%B0
Vicuna: An Open-Source Chatbot Impressing GPT-4 https://lmsys.org/blog/2023-03-30-vicuna/ https://cartinoe5930.tistory.com/entry/Vicuna-An-Open-Source-Chatbot-Impressing-GPT-4-%EB%A6%AC%EB%B7%B0
Koala: A Dialogue Model for Academic Research https://bair.berkeley.edu/blog/2023/04/03/koala/ https://cartinoe5930.tistory.com/entry/%EC%A4%91%EC%9A%94%ED%95%9C-%EA%B1%B4-%EA%BA%BE%EC%9D%B4%EC%A7%80-%EC%95%8A%EB%8A%94-high-quality-data-Koala%F0%9F%90%A8-A-Dialogue-Model-for-Academic-Researc
Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data https://arxiv.org/abs/2304.01196 https://cartinoe5930.tistory.com/entry/%F0%9F%90%B2Baize-An-Open-Source-Chat-Model-with-Parameter-Efficient-Tuning-on-Self-Chat-Data-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
Scaling Data-Constrained Language Models https://arxiv.org/abs/2305.16264 https://www.youtube.com/watch?v=TK0-sitkCMw&pp=ygUgaHR0cHM6Ly9hcnhpdi5vcmcvYWJzLzIzMDUuMTYyNjQ%3D
Falcon & RefinedWeb https://arxiv.org/abs/2306.01116 https://cartinoe5930.tistory.com/entry/Open-LLM-Leaderboard%EB%A5%BC-%ED%9C%A9%EC%93%B4-Falcon%F0%9F%A6%85-LLM-Falcon-RefinedWeb
Orca: Progressive Learning from Complex Explanation Traces of GPT-4 https://arxiv.org/pdf/2306.02707 https://cartinoe5930.tistory.com/entry/%F0%9F%90%ACOrca-Progressive-Learning-from-Complex-Explanation-Traces-of-GPT-4-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
phi-1: Text Books Are All You Need https://arxiv.org/abs/2306.11644 https://cartinoe5930.tistory.com/entry/%ED%95%84%EC%9A%94%ED%95%9C-%EA%B1%B4-%EC%98%A4%EC%A7%81-%EA%B5%90%EA%B3%BC%EC%84%9C-%EC%88%98%EC%A4%80%EC%9D%98-%EB%8D%B0%EC%9D%B4%ED%84%B0%EB%BF%90-%F0%9F%93%96-phi-1-Textbooks-Are-All-You-Need-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
AlpaGasus: Training a Better Alpaca with Fewer Data https://arxiv.org/abs/2307.08701 Will be uploaded later!
Llama 2: Open Foundation and Fine-Tuned Chat Models https://arxiv.org/abs/2307.09288 https://cartinoe5930.tistory.com/entry/The-hopes-of-researchers-Open-source-%F0%9F%A4%97-%EC%97%B0%EA%B5%AC%EC%9E%90%EB%93%A4%EC%9D%98-%ED%9D%AC%EB%A7%9D-Open-source-%F0%9F%A4%97
Platypus: Quick, Cheap, and Powerful Refinement of LLMs https://arxiv.org/abs/2308.07317 Will be uploaded later!
Code Llama: Open Foundation Models for Code https://arxiv.org/abs/2308.12950 No plan
FLM-101B: An Open LLM and How to Train It with $100K Budget https://arxiv.org/pdf/2309.03852 No plan!
Textbooks are All You Need II: phi-1.5 technical report https://arxiv.org/abs/2309.05463 https://huggingface.co/microsoft/phi-1_5
OpenChat: Advancing Open-Source Language Models with Mixed-Quality Data https://arxiv.org/abs/2309.11235 https://github.com/imoneoi/openchat
Mistral 7B https://arxiv.org/abs/2310.06825 https://mistral.ai/news/announcing-mistral-7b/
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models https://arxiv.org/abs/2310.08491 https://huggingface.co/papers/2310.08491#652a8e7f30355beba68c1be6
Zephyr: Direct Distillation of LM Alignment https://arxiv.org/abs/2310.16944 https://www.youtube.com/watch?v=TkZBg3mKsIo
Orca2: Teaching Small Language Models How to Reason https://arxiv.org/abs/2311.11045 https://www.microsoft.com/en-us/research/blog/orca-2-teaching-small-language-models-how-to-reason/
The Falcon Series of Open Language Models https://arxiv.org/abs/2311.16867 No plan!
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling https://arxiv.org/abs/2312.15166 No plan!

Large Language Models(LLMs)๐Ÿ’ฃ

Paper Title Paper or reference site Link Paper Review
LaMDA: Language Models for Dialog Applications blog: https://ai.googleblog.com/2022/01/lamda-towards-safe-grounded-and-high.html, paper: https://arxiv.org/abs/2201.08239 https://cartinoe5930.tistory.com/entry/%EA%B5%AC%EA%B8%80%EC%9D%98-%EC%B5%9C%EA%B0%95-%EC%B1%97%EB%B4%87-LaMDA%EC%97%90-%EB%8C%80%ED%95%B4-%EC%95%8C%EC%95%84%EB%B3%B4%EC%9E%90-Language-Models-for-Dialog-Applications-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
PaLM: Scaling Language Modeling with Pathways blog: https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html, paper: https://arxiv.org/abs/2204.02311 1: https://cartinoe5930.tistory.com/entry/LaMDA%EC%9D%98-%EB%92%A4%EB%A5%BC-%EC%9E%87%EB%8A%94-Pathways%EB%A5%BC-%ED%99%9C%EC%9A%A9%ED%95%9C-%EC%B4%88%EA%B1%B0%EB%8C%80-%EC%96%B8%EC%96%B4-%EB%AA%A8%EB%8D%B8-PaLM-%EB%A6%AC%EB%B7%B0, 2: https://cartinoe5930.tistory.com/entry/LaMDA%EC%9D%98-%EB%92%A4%EB%A5%BC-%EC%9E%87%EB%8A%94-Pathways%EB%A5%BC-%EC%82%AC%EC%9A%A9%ED%95%9C-%EC%B4%88%EA%B1%B0%EB%8C%80-%EC%96%B8%EC%96%B4-%EB%AA%A8%EB%8D%B8-PaLM-%EB%A6%AC%EB%B7%B02
GPT-4: Technical Review blog: https://openai.com/research/gpt-4, paper: https://arxiv.org/abs/2303.08774 https://cartinoe5930.tistory.com/entry/GPT-4-Techinal-Report-Review
Gemini: A Family of Highly Capable Multimodal Models https://arxiv.org/abs/2312.11805 No plan!
AlphaCode 2 Technical Report https://storage.googleapis.com/deepmind-media/AlphaCode2/AlphaCode2_Tech_Report.pdf No plan!

Fine-tuning

Instruction-tuning๐Ÿง‘โ€๐Ÿซ

Paper Title Paper or reference site Link Paper Review
FLAN: Fine-tuned Language Models are Zero-shot Learners https://arxiv.org/abs/2109.01652 https://cartinoe5930.tistory.com/entry/FLAN-Fine-tuned-Language-Models-are-Zero-shot-Learners-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
T0: Multitask Prompted Training Enables Zero-shot Task Generalization https://arxiv.org/abs/2110.08207 https://cartinoe5930.tistory.com/entry/T0-Multitask-Prompted-Training-Enables-Zero-shot-Task-Generalization-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
Super-Natural Instructions: Generalization via Declarative Instructions on 1600+ NLP Tasks https://arxiv.org/abs/2204.07705 https://cartinoe5930.tistory.com/entry/Super-Natural-Instructions-Generalization-via-Declarative-Instructions-on-1600-NLP-Tasks-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
Unnatural Instructions: Tuning Language Models with (Almost) Not Human Labor https://arxiv.org/abs/2212.09689 Will be uploaded later!
Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-shot Learners https://arxiv.org/abs/2210.02969 https://cartinoe5930.tistory.com/entry/Guess-the-Instruction-Flipped-Learning-Makes-Language-Models-Stronger-Zero-shot-Learners-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
Scaling Instruction-Finetuned Language Models https://arxiv.org/abs/2210.11416 https://cartinoe5930.tistory.com/entry/Scaling-Instruction-Finetuned-Language-Models-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
Exploring the Benefits of Training Expert Language Models over Instruction Tuning https://arxiv.org/abs/2302.03202 https://cartinoe5930.tistory.com/entry/Exploring-the-Benefits-of-Training-Expert-Language-Models-over-Instruction-Tuning-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
ICIL: In-Context Instruction Learning https://arxiv.org/abs/2302.14691 https://cartinoe5930.tistory.com/entry/ICIL-In-Context-Instruction-Learning-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
Instruction tuning with GPT-4 https://arxiv.org/abs/2304.03277 https://cartinoe5930.tistory.com/entry/Instruction-Tuning-with-GPT-4-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
FIP: Fixed Input Parameterization for Efficient Prompting https://aclanthology.org/2023.findings-acl.533.pdf Will be uploaded later!
FlaCuna: unleashin the Problem Solving Power of Vicuna using FLAN Fine-tuning https://arxiv.org/abs/2307.02053 Will be uploaded later!
Maybe Only 0.5% Data Is Needed: A Preliminary Exploration of Low Training Data Instruction Tuning https://arxiv.org/abs/2305.09246 Will be uploaded later!
Becoming self-instruct: introducing early stopping criteria for minimal instruct tuning https://arxiv.org/abs/2307.03692 Will be uploaded later!

Reinforcement Learning from Human Feedback(RLHF)๐Ÿ‘ฅ

Paper Title Paper or reference site Link Paper Review
RLHF(Reinforcement Learning from Human Feedback) https://huggingface.co/blog/rlhf https://cartinoe5930.tistory.com/entry/%EC%82%AC%EB%9E%8C%EC%9D%98-%ED%94%BC%EB%93%9C%EB%B0%B1%EC%9D%84-%ED%86%B5%ED%95%9C-%EA%B0%95%ED%99%94%ED%95%99%EC%8A%B5-Reinforcement-Learning-from-Human-Feedback-RLHF
Red Teaming Language Models with Language Models https://arxiv.org/abs/2202.03286 https://cartinoe5930.tistory.com/entry/Red-Teaming-Language-Models-with-Language-Models-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
InstructGPT: Training language models to follow instructions with human feedback https://arxiv.org/abs/2203.02155 https://cartinoe5930.tistory.com/entry/InstructGPT-Training-language-models-to-follow-instructions-with-human-feedback-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
Training a helpful and harmless assistant with reinforcement learning from human feedback https://arxiv.org/abs/2204.05862 https://cartinoe5930.tistory.com/entry/Training-a-helpful-and-harmless-assistant-with-reinforcement-learning-from-human-feedback-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback https://arxiv.org/abs/2305.14387 Will be uploaded later!
ALMoST: Aligning Large Language Models through Synthetic Feedback https://arxiv.org/abs/2305.13735 https://cartinoe5930.tistory.com/entry/Aligning-Large-Language-Models-through-Synthetic-Feedback-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback https://arxiv.org/abs/2307.15217 Will be uploaded later!
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback https://arxiv.org/abs/2309.00267 No plan!
SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF https://arxiv.org/abs/2310.05344 No plan!
HelpSteer: Multi-attribute Helpfulness Dataset for SteerLM https://arxiv.org/abs/2311.09528 No plan!

Efficient-tuning โœจ

Paper Title Paper or reference site Link Paper Review
Adapter: Parameter-Efficient learning for NLP https://arxiv.org/abs/1902.00751 https://cartinoe5930.tistory.com/entry/%EB%8B%B9%EC%8B%A0%EB%8F%84-Fine-tuning-%ED%95%A0-%EC%88%98-%EC%9E%88%EC%8A%B5%EB%8B%88%EB%8B%A4-with-PEFT-%F0%9F%A4%97
Prefix-Tuning: Optimizing Continuous Prompts for Generation https://arxiv.org/abs/2101.00190 https://cartinoe5930.tistory.com/entry/%EB%8B%B9%EC%8B%A0%EB%8F%84-Fine-tuning-%ED%95%A0-%EC%88%98-%EC%9E%88%EC%8A%B5%EB%8B%88%EB%8B%A4-with-PEFT-%F0%9F%A4%97
LoRA: Low-Rank Adaptation of Large Language Models https://arxiv.org/abs/2106.09685 https://cartinoe5930.tistory.com/entry/%EB%8B%B9%EC%8B%A0%EB%8F%84-Fine-tuning-%ED%95%A0-%EC%88%98-%EC%9E%88%EC%8A%B5%EB%8B%88%EB%8B%A4-with-PEFT-%F0%9F%A4%97
Towards a Unified View of Parameter-Efficient Transfer Learning https://arxiv.org/abs/2110.04366 Will be uploaded later!
UniPELT: A Unified Framework for Parameter-Efficient Language Model Tuning https://arxiv.org/abs/2110.07577 Will be uploaded later!
(IA)^3: Few-Shot Parameter-Efficient Fine-TUning is Better and Cheaper than In-Context Learning https://arxiv.org/abs/2205.05638 Will be uploaded later!
QLoRA: Efficient Fine-tuning of Quantized LLMs https://arxiv.org/abs/2305.14314 Will be uploaded later!
Stack More Layers Differently: High-Rank Training Through Low-Rank Updates https://arxiv.org/abs/2307.05695 Will be uploaded later!
LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition https://arxiv.org/abs/2307.13269 Will be uploaded later!

Dataset ๐Ÿ’ซ

Paper Title Paper or reference site Link Paper Review
Instruction Mining: High-quality Instruction Data Selection for Large Language Models https://arxiv.org/abs/2307.06290 No plan!
SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization https://arxiv.org/abs/2212.10465 No plan!
MoDS: Model-oriented Data Selection for Instruction Tuning https://arxiv.org/abs/2311.15653 No plan!
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models https://arxiv.org/abs/2312.06585 No plan!
Magicoder: Source Code Is All You Need https://arxiv.org/abs/2312.02120 No plan!
WaveCoder: Widespread and Versatile Enhanced Instruction Tuning with Refined Data Generation https://arxiv.org/abs/2312.14187 No plan!
What Makes Good Data for Alignment: A Comprehensive Study of Automatic Data Selection in Instruction Tuning https://arxiv.org/abs/2312.15685 No plan!

Prompt Engineering ๐Ÿ‘จโ€๐Ÿ”ง

Paper Title Paper or reference site Link Paper Review
What is the 'Prompt Engineering'? See my blog! https://cartinoe5930.tistory.com/entry/Prompt-Engineering%EC%9D%B4-%EB%AC%B4%EC%97%87%EC%9D%BC%EA%B9%8C
CoT: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models blog: https://ai.googleblog.com/2022/05/language-models-perform-reasoning-via.html, paper: https://arxiv.org/abs/2201.11903 https://cartinoe5930.tistory.com/entry/LM%EC%9D%B4-%EC%82%AC%EB%9E%8C%EA%B3%BC-%EC%9C%A0%EC%82%AC%ED%95%9C-%EC%83%9D%EA%B0%81-%ED%94%84%EB%A1%9C%EC%84%B8%EC%8A%A4%EB%A5%BC-%EA%B0%80%EC%A7%80%EA%B2%8C-%EB%90%9C%EB%8B%A4%EB%A9%B4-Chain-of-Thought-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
Zero-shot CoT: Large Language Models Are Zero-shot Reasoners https://arxiv.org/abs/2205.11916 https://cartinoe5930.tistory.com/entry/Large-Language-Models-are-Zero-Shot-Reasoners-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
Language Models are Multilingual Chain-of-Thought Reasoners https://arxiv.org/abs/2210.03057 Will be uploaded later!
Auto-CoT: Automatic Chain of Thought Prompting in Large Language Models https://arxiv.org/abs/2210.03493 Will be uploaded later!
CoT KD: Teaching Small Language Models to Reason https://arxiv.org/abs/2212.08410 Will be uploaded later!
ToT: Tree of Thoughts: Deliberate Problem Solving with Large Language Models https://arxiv.org/abs/2305.10601 https://cartinoe5930.tistory.com/entry/Tree-of-Thoughts-Deliberate-Problem-Solving-with-Large-Language-Models-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning https://arxiv.org/abs/2305.14045 https://cartinoe5930.tistory.com/entry/CoT-Collection-Improving-Zero-shot-and-Few-shot-Learning-of-Language-Models-via-Chain-of-Thought-Fine-tuning-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
Let's verify step-by-step https://arxiv.org/abs/2305.20050 https://cartinoe5930.tistory.com/entry/Lets-verify-step-by-step-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
Measuring Faitfulness in Chain-of-Thought Reasoning https://arxiv.org/abs/2307.13702 Will be uploaded later!
SoT: Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding https://arxiv.org/abs/2307.15337 Will be uploaded later!
Graph of Thoughts: Solving Elaborate Problems with Large Language Models https://arxiv.org/abs/2308.09687 Will be uploaded later!
From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting https://arxiv.org/abs/2309.04269 No plan!
Chain-of-Verification Resuces Hallucination in Large Language Models https://arxiv.org/abs/2309.11495 https://www.youtube.com/watch?v=l0zFjwRegog&pp=ygUgaHR0cHM6Ly9hcnhpdi5vcmcvYWJzLzIzMDkuMTE0OTU%3D
Contrastive Chain-of-Thought Prompting https://arxiv.org/abs/2311.09277 No plan!
Thread of Thought Unraveling Chaotic Contexts https://arxiv.org/abs/2311.08734 No plan!
System 2 Attention (Is Something You Might Need Too) https://arxiv.org/abs/2311.11829 No plan!
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator https://arxiv.org/abs/2312.04474 No plan!

Model Efficiency ๐Ÿงฐ

Paper Title Paper Paper Review
FlashAttention: Fast and Memory-Efficient Exact Attention https://arxiv.org/abs/2205.14135 https://gordicaleksa.medium.com/eli5-flash-attention-5c44017022ad
Exponentially Faster Language Modeling https://arxiv.org/abs/2311.10770 No plan!
LLM in a flash: Efficient Large Language Model Inference with Limited Memory https://arxiv.org/abs/2312.11514 No plan!

Method ๐Ÿ“

Paper Title Paper or reference site Link Paper Review
Data Augmentations in NLP blogs: https://neptune.ai/blog/data-augmentation-nlp, https://amitness.com/2020/05/data-augmentation-for-nlp/?fbclid=IwAR11MkccCti-2cD93RYftNPHb7Wxdj7AlZG7NNG4EhPaBkmiJkcBPtdl1eo https://cartinoe5930.tistory.com/entry/Data-Augmentation-methods-in-NLP
PET: Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference https://arxiv.org/abs/2001.07676 https://cartinoe5930.tistory.com/entry/PET-Exploiting-Cloze-Questions-for-Few-Shot-Text-Classification-and-Natural-Language-Inference-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
Pathways https://blog.google/technology/ai/introducing-pathways-next-generation-ai-architecture/ https://cartinoe5930.tistory.com/entry/%EB%A7%8C%EC%95%BD-%EB%AA%A8%EB%8D%B8%EC%9D%B4-%EC%97%AC%EB%9F%AC-%EA%B0%90%EA%B0%81%EC%9D%84-%EB%8A%90%EB%82%84-%EC%88%98-%EC%9E%88%EA%B2%8C-%EB%90%9C%EB%8B%A4%EB%A9%B4-Pathways-%EB%A6%AC%EB%B7%B0
LMSI: Large Language Models Can Self-Improve https://arxiv.org/abs/2210.11610 https://cartinoe5930.tistory.com/entry/LMSI-Large-Language-Models-can-Self-Improve-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
Self-Instruct: Aligning Language Model with Self Generated Instruction https://arxiv.org/abs/2212.10560 https://cartinoe5930.tistory.com/entry/Self-Instruct-Aligning-Language-Model-with-Self-Generated-Instructions-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
Reflexion: Language Agents with Verbal Reinforcement Learning https://arxiv.org/abs/2303.11366 https://cartinoe5930.tistory.com/entry/Reflexion-Language-Agents-with-Verbal-Reinforcement-Learning-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
Self-Refine: Iterative Refinement with Self-Feedback https://arxiv.org/abs/2303.17651 https://cartinoe5930.tistory.com/entry/Self-Refine-Iterative-Refinement-with-Self-Feedback-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
REFINER: Reasoning Feedback on Intermediate Representations https://arxiv.org/abs/2304.01904 No plan!
SelFee: Iterative Self-Revising LLM Expowered by Self-Feedback Generation https://kaistai.github.io/SelFee/ https://cartinoe5930.tistory.com/entry/SelFee-Iterative-Self-Revising-LLM-Expowered-by-Self-Feedback-Generation-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints https://arxiv.org/abs/2305.13245 https://aliissa99.medium.com/-a596e4d86f79
Shpherd: A Critic for Language Model Generation https://arxiv.org/abs/2308.04592 Will be uploaded later!
Self-Alignment with Instruction Backtranslation https://arxiv.org/pdf/2308.06259 Will be uploaded later!
SCREWS: A Modular Framework for Reasoning with Revisions https://arxiv.org/pdf/2309.13075 No plan!
NEFTune: Noisy Embeddings Improve Instruction Fineuning https://arxiv.org/abs/2310.05914 https://cartinoe5930.tistory.com/entry/Noise-makes-LLM-better-NEFTune-%F0%9F%98%89
Language Models are Super Mario; Absorbing Abilities from Homologous Models as a Free Lunch https://arxiv.org/abs/2311.03099 No plan!
LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment https://arxiv.org/abs/2312.09979 No plan!

Retrieval Augmented Generation(RAG) ๐Ÿ“š

Paper Title Paper or reference site Link Paper Review
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks https://arxiv.org/abs/2005.11401 No plan!
Self-RAG: Learning to Retrieve, Generate, And Critique Through Self-Reflection https://arxiv.org/abs/2310.11511 No plan!
InstructRetro: Instruction Tuning Post Retrieval-Augmented Pretraining https://arxiv.org/abs/2310.07713 No plan!
Retrieval-Augmented Generation for Large Language Models: A Survey https://arxiv.org/abs/2312.10997 No plan!

Benchmarks ๐Ÿ† & Evaluation Metric โš”๏ธ

Paper Title Paper or reference site Link Paper Review
BIG-Bench Hard: Challenging BIG-Bench tasks and whether chain-of-thought can solve tham https://arxiv.org/abs/2210.09261 Will be uploaded later!
Large Language Models are not Fair Evaluators https://arxiv.org/abs/2305.17926 Will be uploaded later!
MT-Bench: Judging LLM-as-a-judge with MT-Bench https://arxiv.org/abs/2306.05685 Will be uploaded later!
InstructEval: Towards Holistic Evaluation of Instruction-Tuned Large Language Models https://arxiv.org/abs/2306.04757 Will be uploaded later!
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets https://arxiv.org/abs/2307.10928 Will be uploaded later!
GAIA: A Benchmark for General AI Assistants https://arxiv.org/abs/2311.12983 No plan!

Context of LLM ๐Ÿ“œ

Paper Title Paper or reference site Link Paper Review
A Length-Extrapolatable Transformer https://arxiv.org/abs/2212.10554 No plan!
Extending Context Window of Large Language Models via Positional Interpolation https://arxiv.org/abs/2306.15595 https://cartinoe5930.tistory.com/entry/LM%EC%9D%98-context-window-%EA%B8%B8%EC%96%B4%EC%95%BC-%ED%95%A0%EA%B9%8C-%EC%A7%A7%EC%95%84%EC%95%BC-%ED%95%A0%EA%B9%8C-%F0%9F%93%8F%F0%9F%A4%A8
LongNet: Scaling Transformers to 1,000,000,000 Tokens https://arxiv.org/abs/2307.02486 https://cartinoe5930.tistory.com/entry/LM%EC%9D%98-context-window-%EA%B8%B8%EC%96%B4%EC%95%BC-%ED%95%A0%EA%B9%8C-%EC%A7%A7%EC%95%84%EC%95%BC-%ED%95%A0%EA%B9%8C-%F0%9F%93%8F%F0%9F%A4%A8
Lost in the Middle: How Language Models Use Long Contexts https://arxiv.org/abs/2307.03172 https://cartinoe5930.tistory.com/entry/LM%EC%9D%98-context-window-%EA%B8%B8%EC%96%B4%EC%95%BC-%ED%95%A0%EA%B9%8C-%EC%A7%A7%EC%95%84%EC%95%BC-%ED%95%A0%EA%B9%8C-%F0%9F%93%8F%F0%9F%A4%A8
YaRN: Efficient Context Window Extension of Large Language Models https://arxiv.org/abs/2309.00071 No plan!

Analysis๐Ÿ”ฌ

Paper Title Paper or reference site Link Paper Review
Why can GPT learn in-context? https://arxiv.org/abs/2212.10559 https://cartinoe5930.tistory.com/entry/Why-can-GPT-learn-in-context-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
Sparks of Artificial General Intelligence: Early experiments with GPT-4 paper: https://arxiv.org/abs/2303.12712, youtube: https://www.youtube.com/watch?v=Mqg3aTGNxZ0 https://cartinoe5930.tistory.com/entry/Sparks-of-Artificial-General-Intelligence-Early-experiments-with-GPT-4-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
The False Promise of Imitating Proprietary LLMs https://arxiv.org/abs/2305.15717 https://cartinoe5930.tistory.com/entry/%EA%B8%B0%EC%A1%B4-imitation-model%EC%9D%80-%EC%9E%98%EB%AA%BB-%ED%95%99%EC%8A%B5%EB%90%98%EA%B3%A0-%EC%9E%88%EB%8B%A4-%F0%9F%AB%A2-The-False-Promise-of-Imitating-Proprietary-L
TULU: How Far Can Camels Go? Exploring the State of Instructiopn Tuning on Open Resources https://arxiv.org/abs/2306.04751 Will be uploaded later!
How Is ChatGPT's Behavior Changing over Time? https://arxiv.org/abs/2307.09009 https://cartinoe5930.tistory.com/entry/ChatGPT%EC%9D%98-%EC%84%B1%EB%8A%A5%EC%9D%B4-%EC%95%88-%EC%A2%8B%EC%95%84%EC%A7%80%EA%B3%A0-%EC%9E%88%EB%8B%A4%EA%B5%AC-%F0%9F%98%B2%F0%9F%98%B2
Large Language Models Cannot Self-Correct Reasoning Yet https://arxiv.org/abs/2310.01798
How Far Are Large Language Models from Agents with Theory-of-Mind https://arxiv.org/pdf/2310.03051 No plan!
Can LLMs Follow Simple Rules https://arxiv.org/abs/2311.04235 https://www.youtube.com/watch?v=CY6o43037OY
Camels in a Changing Climate; Enhancing LM Adaptation with Tulu 2 https://arxiv.org/abs/2311.10702 No plan!
ChatGPT's One-year Anniversary; Are Open-Source Large Language Models Catching up https://arxiv.org/abs/2311.15653 No plan!
An In-depth Look at Gemini's Language Abilities https://arxiv.org/abs/2312.11444 No plan!

Interesting๐Ÿซฃ

Paper Title Paper or reference site Link Paper Review
DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature https://arxiv.org/abs/2301.11305 https://cartinoe5930.tistory.com/entry/%EC%9D%B4-%EA%B8%80%EC%9D%B4-LM%EC%9D%B4-%EB%A7%8C%EB%93%A4%EC%96%B4%EB%82%B8-%EA%B8%80%EC%9D%BC%EA%B9%8C-%EB%8F%84%EC%99%80%EC%A4%98-DetectGPT-DetectGPT-Zero-Shot-Machine-Generated-Text-Detection-using-Probability-Curvature-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback https://arxiv.org/abs/2302.12813 https://cartinoe5930.tistory.com/entry/ChatGPT%EC%9D%98-hallucination-%EC%96%B4%EB%96%BB%EA%B2%8C-%ED%95%B4%EA%B2%B0%ED%95%B4%EC%95%BC-%ED%95%A0%EA%B9%8C-Check-Your-Facts-and-Try-Again-Improving-Large-Language-Models-with-External-Knowledge-and-Automated-Feedback
RecurrentGPT: Interactive Generation of (Arbitrarily) Long Text https://arxiv.org/abs/2305.13304 https://cartinoe5930.tistory.com/entry/ChatGPT%EC%97%90-%EB%B0%98%EB%B3%B5-%EB%A9%94%EC%BB%A4%EB%8B%88%EC%A6%98LSTM%EC%9D%84-%EC%82%AC%EC%9A%A9%ED%95%9C%EB%8B%A4%EB%A9%B4-RecurrentGPT-Interactive-Generation-of-Arbitrarily-Long-Text-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
Large Language Models as Tool Makers https://arxiv.org/abs/2305.17126 https://cartinoe5930.tistory.com/entry/LM%EC%9D%B4-%EB%8F%84%EA%B5%AC%EB%A5%BC-%EC%82%AC%EC%9A%A9%ED%95%98%EA%B2%8C-%EB%90%9C%EB%8B%A4%EB%A9%B4-%F0%9F%94%AC-Large-Language-Models-as-Tool-Makers-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
LLM-Blender: Ensembling Large Language Models with Pairwise Ranking and Generative Fusion https://arxiv.org/abs/2306.02561 No plan!
Knowledge Distillation of Large Language Models https://arxiv.org/abs/2306.08543 https://cartinoe5930.tistory.com/entry/KD%EC%97%90-%EC%82%B4%EC%A7%9D%EC%9D%98-%EB%B3%80%ED%99%94%EB%A5%BC-%EC%A4%98%EB%B3%B4%EC%9E%90-%F0%9F%98%9C-Knowledge-Distillation-of-Large-Language-Models-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
Scaling Relationship on Learning Mathematical Reasoning with Large Language Models https://arxiv.org/abs/2308.01825 Will be uploaded later!
ToolLLM: Facilitating Lare Language Models to Master 16000+ Real-World APIs https://arxiv.org/abs/2307.16789 Will be uploaded later!
SelfCheck: Using LLMs to Zero-shot Check Their Own Step-by-Step Reasoning https://arxiv.org/abs/2308.00436 Will be uploaded later!
Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification https://arxiv.org/abs/2308.07921 Will be uploaded later!
Large Language Models as Optimizers https://arxiv.org/abs/2309.03409 No plan!
FIAT: Fusing Learning Paradigms with Instruction-Accelerated Tuning https://arxiv.org/abs/2309.04663 https://www.youtube.com/watch?v=EZsZEcRDte0&pp=ygUgaHR0cHM6Ly9hcnhpdi5vcmcvYWJzLzIzMDkuMDQ2NjM%3D
Contrastive Decoding Improves Reasoning in Large Language Models https://arxiv.org/abs/2309.09117 https://www.youtube.com/watch?v=nMR56TkwC1Q&pp=ygUgaHR0cHM6Ly9hcnhpdi5vcmcvYWJzLzIzMDkuMDkxMTc%3D
Think before you speak: Training Language Models with Pause Tokens https://arxiv.org/abs/2310.02226 https://www.youtube.com/watch?v=MtJ1jacr_yI
Large Language Models Can Learn Rules https://arxiv.org/abs/2310.07064 No plan!
In-context Pretraining: Language Modeling Beyond Document Boundaries https://arxiv.org/abs/2310.10638 https://www.youtube.com/watch?v=GI-0lAaILrU
Learning From Mistakes Makes LLM Better Reasoner https://arxiv.org/abs/2310.20689 No plan!
Language Models can be Logical Solvers https://arxiv.org/abs/2311.06158 No plan!
MART: Improving LLM Safety with Multi-round Automatic Red-Teaming https://arxiv.org/abs/2311.07689 No plan!
Fine-tuning Language Models for Factuality https://arxiv.org/abs/2311.08401 No plan!
Positional Description Matters for Transformers Arithmetic https://arxiv.org/abs/2311.14737 No plan!
Weak-to-Strong Generalization: Eliciting Strong Capabilities with Weak Supervision https://arxiv.org/abs/2312.09390 https://openai.com/research/weak-to-strong-generalization
TinyGSM: achieving higher than 80 percentage on GSM8k with small language models https://arxiv.org/abs/2312.09241 No plan!

Korean LM ๐Ÿ‡ฐ๐Ÿ‡ท

Paper Title Paper or reference site Link Paper Review
Morpheme-aware Subword Tokenizer: An Empirical Study of Tokenization Strategies for Various Korean NLP Tasks https://arxiv.org/abs/2010.02534 Will be uploaded later!
What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers https://arxiv.org/abs/2109.04650 Will be uploaded later!

Computer Vision

Paper Title Paper or reference site Link Paper Review
history of CNN LeNet, AlexNet, VGGNet, GoogLeNet, ResNet, ResNeXt, Sception, Mobilenet, DenseNet, EfficientNet, ConvNext https://cartinoe5930.tistory.com/entry/CNN-network%EC%9D%98-%EC%97%AD%EC%82%AC
ViT: An Image Worth 16 x 16 Words: Transformers for Image Recognition at Scale https://arxiv.org/abs/2010.11929 https://cartinoe5930.tistory.com/entry/ViT-An-Image-Worth-16-x-16-Words-Transformers-for-Image-Recognition-at-Scale
Swin Transformer: Hierarchical Vision Transformer using Shifted Winodws https://arxiv.org/abs/2103.14030 https://cartinoe5930.tistory.com/entry/Swin-Transformer-Hierarchical-Vision-Transformer-using-Shifted-Windows-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
CLIP: Learning Transferable Visual Models From Natural Language Supervision https://arxiv.org/abs/2103.00020 https://cartinoe5930.tistory.com/entry/CLIP-Learning-Transferable-Visual-Models-From-Natural-Language-Supervision-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0

Multi-modal Models

Paper Title Paper or reference site Link Paper Review
Let's learn about VLM(Visual-Language Model) https://huggingface.co/blog/vision_language_pretraining#supporting-vision-language-models-in-%F0%9F%A4%97-transformers https://cartinoe5930.tistory.com/entry/VLMVision-Language-Model%EC%97%90-%EB%8C%80%ED%95%B4-%EC%95%8C%EC%95%84%EB%B3%B4%EC%9E%90
VisualBERT: A simple and Performant Baseline for Vision and Language https://arxiv.org/abs/1908.03557 https://cartinoe5930.tistory.com/entry/VisualBERT-A-Simple-and-Performant-Baseline-for-Vision-and-Language-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
ViLBERT: Pre-training Task-Agnostic Visiolinguistic Representations for Visual-and-Language Tasks https://arxiv.org/abs/1908.02265 https://cartinoe5930.tistory.com/entry/ViLBERT-Pretraining-Task-Agnostic-Visiolinguistic-Representations-for-Visual-and-Language-Tasks
LXMERT: Learning Cross-Modality Encoder Representations from Transformers https://arxiv.org/abs/1908.07490 https://cartinoe5930.tistory.com/entry/LXMERT-Learning-Cross-Modality-Encoder-Representations-from-Transformers-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
VL-BERT: Pre-training of Generic Visual-Linguistic Representations https://arxiv.org/abs/1908.08530 https://cartinoe5930.tistory.com/entry/VL-BERT-Pre-training-of-Generic-Visual-Linguistic-Representations-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
VLP: Unified Vision-Language Pre-Training for Image Captioning and VQA https://arxiv.org/abs/1909.11059 https://cartinoe5930.tistory.com/entry/VLP-Unified-Vision-Language-Pre-Traning-for-Image-Captioning-and-VQA-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks https://arxiv.org/abs/2004.06165 https://cartinoe5930.tistory.com/entry/Oscar-Object-Semantics-Aligned-Pre-training-for-Vision-Language-Tasks-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
VinVL: Revisiting Visual Representations in Vision-Language Models https://arxiv.org/abs/2101.00529 https://cartinoe5930.tistory.com/entry/VinVL-Revisiting-Visual-Representations-in-Vision-Language-Models-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision https://arxiv.org/abs/2102.03334 https://cartinoe5930.tistory.com/entry/ViLT-Vision-and-Language-Transformer-Without-Convolution-or-Region-Supervision-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
ALIGN: Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision https://arxiv.org/abs/2102.05918 https://cartinoe5930.tistory.com/entry/ALIGN-Scaling-up-Visual-and-Vision-Language-Representation-with-Noisy-Text-Supervision-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
ALBEF: Vision and Language Representation Learning with Momentum Distillation https://arxiv.org/abs/2107.07651 https://cartinoe5930.tistory.com/entry/ALBEF-Vision-and-Language-Representation-Learning-with-Momentum-Distillation-%EB%85%BC%EB%AC%B8
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision https://arxiv.org/abs/2108.10904 https://cartinoe5930.tistory.com/entry/SimVLM-Simple-Visual-Language-Model-Pre-training-with-Weak-Supervision-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
VLMo: Unified Vision-Language Pre-training with Mixture-of-Modality-Experts https://arxiv.org/abs/2111.02358 https://cartinoe5930.tistory.com/entry/VLMo-Unified-Vision-Language-Pre-training-with-Mixture-of-Modality-Experts-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
LiT๐Ÿ”ฅ : Zero-Shot Transfer with Locked-image text Tuning https://arxiv.org/abs/2111.07991 https://cartinoe5930.tistory.com/entry/LiT%F0%9F%94%A5-Zero-Shot-Transfer-with-Locked-image-text-Tuning-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
FLAVA: A Foundational Language And Vision Alignment Model https://arxiv.org/abs/2112.04482 https://cartinoe5930.tistory.com/entry/FLAVA-A-Foundational-Language-And-Vision-Alignment-Model-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation https://arxiv.org/abs/2201.12086 https://cartinoe5930.tistory.com/entry/BLIP-Bootstrapping-Language-Image-Pre-training-fro-Unified-Vision-Language-Understanding-and-Generation-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0

Deep Learning Concept

Paper or Posting Title reference site Link Review
Knowledge Distillation: Distilling the Knowledge in a Neural Network https://arxiv.org/abs/1503.02531 https://cartinoe5930.tistory.com/entry/Distilling-the-Knowledge-in-a-Neural-Network-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0
What is Zero-shot, One-shot, Few-shot Learning? see my blog! https://cartinoe5930.tistory.com/entry/Zero-shot-One-shot-Few-shot-Learning%EC%9D%B4-%EB%AC%B4%EC%97%87%EC%9D%BC%EA%B9%8C