LLMsys

Learning how to all in LLMs.

I plan to write a blog for a beginners to start with MLsys and LLMsys. LLMs have been one of the most critical and hottest topcis and battle fields thus learning how to 'elegantly' living with them are very important. The blog will contains several milestone in recent years which I think could be the backbone of the future.

TODO:

Distributed Training
1. DP, MP, TP, PP
2. Execise on Deepspeed
3. Execise on Megatron-LM
4. Execise on Megatron-Deepspeed
5. Optimizing CUDA kernels
6. Scaling up
LLM serving and inference
1. vLLM(done)
2. FashionAttention
3. PowerInfer
4. FastTransformer
5. TensoreRT-LLM

vLLM

In serving/vllm_eva, I created two scripts to test call vllm locally and remotely by API (TODO).

Flash-Attention and Flash-Attention-2

Related article: https://zhuanlan.zhihu.com/p/639228219

Zhuohao-Li/LLMsys

LLMsys

vLLM

Flash-Attention and Flash-Attention-2

TensorRT-LLM