[Roadmap] LMFlow Roadmap

Question

[Roadmap] LMFlow Roadmap

Opened this issue 6 months ago · 2 comments

This document includes the features in LMFlow's roadmap. We welcome any discuss or contribute to the specific features at related Issues/PRs. 🤗

Main Features

Data
- DPO dataset format #867
- Conversation template in DPO #883
- jinja template
- Tools in conversation dataset #884 #892
- Packing with block diagonal attention
Model
- Backend
  - 🏗️ Accelerate support
- Tokenization
  - Tokenization update, using hf method
Pipeline
- Train/Finetune/Align
  - DPO (multi-gpu) #867
  - Iterative DPO #867 #883
  - PPO
  - LISA (multi-gpu, qwen2, chatglm) #899
  - Batch size and learning rate recommendation (arxiv)
  - No trainer version pipelines, allowing users to customize/modify based on their needs
  - Sparse training for moe models #879
- Inference
  - vllm inference #860 #863
  - Reward model scoring #867
  - Multiple instances inference (vllm, rm, others) #883
  - Inference checkpointing and resume from checkpoints
  - Inference accelerate EAGLE
  - Inferencer for chat/instruction models, and chatbot.py upgrade #917

Usability

Make some packages/functions (gradio, vllm, ray, etc.) optional, add conditional import. #905
Inference method auto-downgrading (vllm>ds, etc.), and make vllm package optional
Merging similar model methods into hf_model_mixin
Set torch_dtype='bfloat16' when bf16 is specified, etc. (bf16 is in FinetunerArguments but torch_dtype is in ModelArguments, thus cannot handle in __post_init__(). )

Bug fixes

model.generate() with dsz3 #861
merge_lora lora with abs path merging
load_dataset long data fix #878
src/lmflow/utils/common.py create_copied_dataclass compatibility when python version >= 3.10 (kw_only issue) #903 #905

Issues left over from history

use_accelerator -> use_accelerate typo fix (with Accelerate support PR)
model_args.use_lora leads to truncation of the sequence, mentioned in #867
Make ports, addresses, and all other settings in distributed training tidy and clear (with Accelerate support PR)

Documentation

Approx GPU memory requirement w.r.t model size & pipeline
Dev handbook, indicating styles, test list, etc.

Answer 1 · 2024-06-25T02:40:56.000Z

Note on multiple instances inference:
In vllm inference, the number of attn heads should be devisible by vllm tensor parallel size. If we have a 14 heads LLM, then the options for tp is 1&2 (7 will cause another division issue, but I just forget what that issue is).
Say we have 8 gpus, then to utilize these devices, multiple instances vllm inference is necessary (tp=1 -> 8 instances, and tp=2 -> 4 instances)
Also, same for rm inference, and any other inference pipelines.

Answer 2 · 2024-09-04T01:43:10.000Z

Now supports Iterative DPO #883