/Awosome-LLM

Creative Commons Zero v1.0 UniversalCC0-1.0

Awosome-LLM

Leaderboards

Distributed Training Tools/Frameworks

  • Megatron-LM Ongoing research training transformer models at scale. GitHub Repo stars GitHub last commit
  • DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective. GitHub Repo stars GitHub last commit
  • RedCoast(Redco) A Lightweight Tool to Automate Distributed Training and Inference. Code GitHub Repo stars GitHub last commit

Deploy Tools/Frameworks

RAG Frameworks

Reduce Input/Output Tokens

  • Chunking of input documents
  • Compression of input tokens: LLMLingua Series
  • Summarization of input tokens
  • Avoid adding few-shot examples
  • Limit the length of the output and its formatting

Model Routing

Quantization

Caching

Tensor Parallelism

Heterogeneous Parallel Inference

Other Tools

  • LLM AutoEval: Automatically evaluate your LLMs using RunPod
  • LazyMergekit Easily merge models using MergeKit in one click
  • AutoQuant Quantize LLMs in GGUF, GPTQ, EXL2, AWQ, and HQQ formats in one click
  • Model Family Tree Visualize the family tree of merged models
  • ZeroSpace Automatically create a Gradio chat interface using a free ZeroGPU
  • ExLlamaV2 Colab Quantize and run EXL2 models and upload them to the HF Hub
  • LMQL is a Python-based programming language for LLM programming with declarative elements.

Other Papers

  • Sarathi-Serve Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve.