[Roadmap] LMFlow Roadmap
Opened this issue · 2 comments
wheresmyhair commented
This document includes the features in LMFlow's roadmap. We welcome any discuss or contribute to the specific features at related Issues/PRs. 🤗
Main Features
- Data
- Model
- Backend
- 🏗️ Accelerate support
- Tokenization
- Tokenization update, using hf method
- Backend
- Pipeline
- Train/Finetune/Align
- Inference
Usability
- Make some packages/functions (gradio, vllm, ray, etc.) optional, add conditional import. #905
- Inference method auto-downgrading (vllm>ds, etc.), and make
vllm
package optional - Merging similar model methods into
hf_model_mixin
- Set
torch_dtype='bfloat16'
whenbf16
is specified, etc. (bf16
is inFinetunerArguments
buttorch_dtype
is inModelArguments
, thus cannot handle in__post_init__()
. )
Bug fixes
-
model.generate()
with dsz3 #861 -
merge_lora
lora with abs path merging -
load_dataset
long data fix #878 - src/lmflow/utils/common.py
create_copied_dataclass
compatibility when python version >= 3.10 (kw_only
issue) #903 #905
Issues left over from history
-
use_accelerator
->use_accelerate
typo fix (with Accelerate support PR) -
model_args.use_lora
leads to truncation of the sequence, mentioned in #867 - Make ports, addresses, and all other settings in distributed training tidy and clear (with Accelerate support PR)
Documentation
- Approx GPU memory requirement w.r.t model size & pipeline
- Dev handbook, indicating styles, test list, etc.
wheresmyhair commented
Note on multiple instances inference:
In vllm inference, the number of attn heads should be devisible by vllm tensor parallel size. If we have a 14 heads LLM, then the options for tp is 1&2 (7 will cause another division issue, but I just forget what that issue is).
Say we have 8 gpus, then to utilize these devices, multiple instances vllm inference is necessary (tp=1 -> 8 instances, and tp=2 -> 4 instances)
Also, same for rm inference, and any other inference pipelines.
wheresmyhair commented
Now supports Iterative DPO #883