/SLMs-paper-list

Papers about Small Language Models(SLMs)

paper list

Why SLMs may work

Pre-training loss is the key factor for downstream tasks and emergent ablities, not model size and data size:

Finetuning

SLMs are very suitable for finetuning to specific tasks and domains.

Agent

Specific domain

Knowledge Injection

Injecting knowledge is crucial. Use RAG or FT? That's a question:

SLMs, not LLMs

Different Architecture

Different architecture or SLMs:

Different Training Strategy

Knowledge distillation is great:

  • Gemma2, which use knowledge distillation in PT(Pre-Training) and IT(Instruction-Tuned) stage.

More data and longer training is good(but only with log scale):

Reasoning Machine

More inference-time compute benefit LLMs, so it's a natural question: Is it a better way to use cheaper and faster SLMs instead of LLMs to do complex reasoning in inference time?