LLM-Pipline

Recently, we have seen great advancements in natural language processing, especially in pre-trained large language models (LLMs). Besides, I believe the applications of LLMs must excite everyone of you, including me. So I want to delve into the overall process of how LLMs are applied and build up our own technology stack.

LLM Pipline

Data Cleaning

Training

Deployment

Quantization

It is common to witness such an awkward scenario where we don't have enough GPU memory to deploy LLMs. Needless to say, we cannot adopt LLMs into our domain with their inference. So I first introduce some quantization techniques to alleviate our GPU stress, as follows:

Quantization

Perhaps you might be interested in Dan Alistarh, who is the author of GTQT techniques. GTQP has ...... More details about him, you can refer to Dan Alistarh

Inference

Transformer Library https://huggingface.co/blog/llama2#using-transformers
vLLM
TGI
Web generation UI

Evaluation

First, I recommend some straightforward lists about the performance of existing LLMs, as follows:

English
- open-llm-learderboard
- alpaca
Chinese
- SuperCLUE

Some frameworks and papers to explore:

Metrogan
Deepspeed
vLLM
k8s
Docker
Flash attention 1 and 2
RLHF
training custom LLMs
Qlora
Finetuning
Landmark attention
mysys
cuda 算子
字节AML 阿里PAI
GPT cache
算子
cuda
推理引擎
训练框架
机器学习平台
cot
flan
orca
platypus
peft
ds
RHLF
RHAHL

Course Videos

A Full List on TinyML http://tinyml.seas.harvard.edu/courses/
MIT 6.5940 https://hanlab.mit.edu/courses/2023-fall-65940
ESE3600 https://tinyml.seas.upenn.edu/
A Full List on CS https://github.com/Developer-Y/cs-video-courses

JiwenJ/LLM-Pipline