Learning how to all in LLMs.
I plan to write a blog for a beginners to start with MLsys and LLMsys. LLMs have been one of the most critical and hottest topcis and battle fields thus learning how to 'elegantly' living with them are very important. The blog will contains several milestone in recent years which I think could be the backbone of the future.
TODO:
- Distributed Training
- DP, MP, TP, PP
- Execise on Deepspeed
- Execise on Megatron-LM
- Execise on Megatron-Deepspeed
- Optimizing CUDA kernels
- Scaling up
- LLM serving and inference
- vLLM(done)
- FashionAttention
- PowerInfer
- FastTransformer
- TensoreRT-LLM
In serving/vllm_eva
, I created two scripts to test call vllm locally and remotely by API (TODO).
Related article: https://zhuanlan.zhihu.com/p/639228219