DeepLink-org/deeplink.framework

[DIPU] mock_cuda设为true后,torch.device导致guard检查失败造成重新编译的问题。

Reinerzhou opened this issue · 0 comments

背景

编译模式下,在燧原和华为机器上运行llama_finetune。

问题描述

在mock_cuda为true的情况下,若在模型运行前调用torch.set_default_device,分析timeline发现每次都会触发dynamo的compile,查看日志调查发现:

torch._dynamo.convert_frame.__recompiles: [DEBUG] ('Recompiling function _is_fp16_bf16_tensor in /home/cse/zhousl/accelerate/src/accelerate/utils/operations.py', "triggered by the following guard failure: utils_device.CURRENT_DEVICE == device(type='cuda', index=0)")

是utils_device.CURRENT_DEVICE == device(type='cuda', index=0) 检查失败导致每次都会重新编译。

分析发现:

  • 调用torch.set_default_device时,会触发torch中的这段代码对CURRENT_DEVICE进行设置,这时候用到的torch.device为torch原本的api。
    https://github.com/DeepLink-org/pytorch/blob/eb31e39bb07f00cf1c917a3bc83867981b2c04cd/torch/utils/_device.py#L57C1-L65C35

  • 而在运行过程中对utils_device.CURRENT_DEVICE == device(type='cuda', index=0) 做检查时,调用的是dipu mock后的torch.device api,mock前后的torch.device调用结果并不一致。

    class _DIPUDevice(metaclass=_MetaDeviceType):
    @staticmethod
    def __replacedipu(arg):
    if (__dipu__ in arg):
    arg = arg.replace(__dipu__, __dipu_device_type__)
    if (mockcuda and "cuda" in arg):
    arg = arg.replace("cuda", __dipu_device_type__)
    return arg
    def __new__(cls, *args, **kwargs):
    if len(args) == 1 and isinstance(args[0], int) and mockcuda:
    # modify default int device type only when "mock cuda".
    dev_name = __dipu_device_type__ + ":" + str(args[0])
    _device = _MetaDeviceType._torch_device(dev_name)
    return _device
    # handle device as str
    if len(args) >= 1 and isinstance(args[0], str):
    argList = list(args)
    argList[0] = cls.__replacedipu(args[0])
    args = tuple(argList)
    # handle parameter type: str, not support int type but str and device
    deviceValue = kwargs.get("type", None)
    if deviceValue != None and isinstance(deviceValue, str):
    kwargs["type"] = cls.__replacedipu(deviceValue)
    _device = _MetaDeviceType._torch_device(*args, **kwargs)
    return _device
    # always patch: device class is immutable, cannot directly patch __new__ method on python layer.
    torch.device = _DIPUDevice

  • 简单来说就是torch_dipu中缺少torch中从torch.set_default_device到CURRENT_DEVICE 的完整逻辑链路,且torch.device mock前后在做guards检查时并不一致。

  • 可通过如下代码复现:
    'import torch
    t1 = torch.device(type='cuda', index=0)
    import torch_dipu
    t2 = torch.device(type='cuda', index=0)
    print(t1 == t2) # 结果为False'

  • 目前可通过不调用torch.set_default_device,或者在import torch之前先import torch_dipu(这样可以在设置CURRENT_DEVICE前就让dipu mock掉torch.device api)来避免guards检查失败导致重编这个问题。