LLM-Pipline

Recently, we have seen great advancements in natural language processing, especially in pre-trained large language models (LLMs). Besides, I believe the applications of LLMs must excite everyone of you, including me. So I want to delve into the overall process of how LLMs are applied and build up our own technology stack.


Data Cleaning

Training

Deployment

Quantization

It is common to witness such an awkward scenario where we don't have enough GPU memory to deploy LLMs. Needless to say, we cannot adopt LLMs into our domain with their inference. So I first introduce some quantization techniques to alleviate our GPU stress, as follows:

  • Quantization

Perhaps you might be interested in Dan Alistarh, who is the author of GTQT techniques. GTQP has ...... More details about him, you can refer to Dan Alistarh

Inference

Evaluation

First, I recommend some straightforward lists about the performance of existing LLMs, as follows:

Some frameworks and papers to explore:

  • Metrogan
  • Deepspeed
  • vLLM
  • k8s
  • Docker
  • Flash attention 1 and 2
  • RLHF
  • training custom LLMs
  • Qlora
  • Finetuning
  • Landmark attention
  • mysys
  • cuda 算子
  • 字节AML 阿里PAI
  • GPT cache
  • 算子
  • cuda
  • 推理引擎
  • 训练框架
  • 机器学习平台
  • cot
  • flan
  • orca
  • platypus
  • peft
  • ds
  • RHLF
  • RHAHL

Course Videos