alibaba/Megatron-LLaMA

hf权重转换代码小bug

yuanzhoulvpi2017 opened this issue · 0 comments

在代码的这个地方,if config.num_hidden_layers % args.target_tensor_model_parallel_size != 0:写的不对,不应该是args.target_tensor_model_parallel_size , 应该是args.target_pipeline_model_parallel_size

if config.num_hidden_layers % args.target_tensor_model_parallel_size != 0:
        raise ValueError(
            f"Number of layers ({config.num_hidden_layers}) must be divisible by number of tensor parallelism"
            f" ({args.target_tensor_model_parallel_size})"
        )
    num_layers = config.num_hidden_layers // args.target_pipeline_model_parallel_size

    layer_re = re.compile(r"model.layers\.(\d+)\.([a-z0-9_.]+)\.([a-z]+)")

https://github.com/alibaba/Megatron-LLaMA/blob/main/tools/checkpoint_conversion/llama_checkpoint_conversion.py#L675C47-L675C47