DefTruth/torchlm

Can't use cuda in pipnet

try-agaaain opened this issue · 15 comments

I want use cuda in pipnet, so I run the following code:

import torchlm
from torchlm.tools import faceboxesv2
from torchlm.models import pipnet
import cv2
image_path = '../rgb/image0/1.png'
image = cv2.imread(image_path)
torchlm.runtime.bind(faceboxesv2())
torchlm.runtime.bind(
    pipnet(backbone="resnet18", pretrained=True,
        num_nb=10, num_lms=98, net_stride=32, input_size=256,
        meanface_type="wflw", checkpoint=None, map_location="cuda")
)
torchlm.runtime.forward(image)

Then I get a error that say:

RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor

I know it due to my image data is still stay in cpu instead of gpu, and I need load my data to gpu. So I add a line code like following:

image = cv2.imread(image_path)
image = torch.tensor(image).cuda()

But now I get another error:

  File "C:\Home\Development\Anaconda\envs\DeepLearning\lib\site-packages\torchlm\tools\_faceboxesv2.py", line 305, in apply_detecting
    image_scale = cv2.resize(
cv2.error: OpenCV(4.5.5) :-1: error: (-5:Bad argument) in function 'resize'
> Overload resolution failed:
>  - src is not a numpy array, neither a scalar
>  - Expected Ptr<cv::UMat> for argument 'src'

It means I need pass a ndarray array instead of a torch tensor. But if I pass a ndarray, its data will stay in cpu, and I will get the first error again.

What shold I do? Hava anyone get the same error?

try to also bind faceboxesv2 with param device="cuda".

torchlm.runtime.bind(faceboxesv2(device="cuda"))

Without success, and the problem has not changed

can you show me the details of the error?

The details of the first error are as follows:

C:\Home\Development\Anaconda\envs\DeepLearning\python.exe D:/ProgrammingCode/PythonPractice/Draft/main.py
Traceback (most recent call last):
  File "D:\ProgrammingCode\PythonPractice\Draft\main.py", line 13, in <module>
    torchlm.runtime.forward(image)
  File "C:\Home\Development\Anaconda\envs\DeepLearning\lib\site-packages\torchlm\runtime\_wrappers.py", line 120, in forward
    return RuntimeWrapper.forward(
  File "C:\Home\Development\Anaconda\envs\DeepLearning\lib\site-packages\torchlm\runtime\_wrappers.py", line 79, in forward
    lms_pred = cls.landmarks_base.apply_detecting(crop, **kwargs)  # (m,2)
  File "C:\Home\Development\Anaconda\envs\DeepLearning\lib\site-packages\torchlm\models\pipnet\_impls.py", line 145, in apply_detecting
    return _detecting_impl(net=self, image=image)
  File "C:\Home\Development\Anaconda\envs\DeepLearning\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "C:\Home\Development\Anaconda\envs\DeepLearning\lib\site-packages\torchlm\models\pipnet\_impls.py", line 292, in _detecting_impl
    outputs_cls, outputs_x, outputs_y, outputs_nb_x, outputs_nb_y = net.forward(image)
  File "C:\Home\Development\Anaconda\envs\DeepLearning\lib\site-packages\torchlm\models\pipnet\pipnet.py", line 265, in forward
    return self._forward_impl(x)
  File "C:\Home\Development\Anaconda\envs\DeepLearning\lib\site-packages\torchlm\models\pipnet\pipnet.py", line 248, in _forward_impl
    x = self.conv1(x)
  File "C:\Home\Development\Anaconda\envs\DeepLearning\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Home\Development\Anaconda\envs\DeepLearning\lib\site-packages\torch\nn\modules\conv.py", line 447, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "C:\Home\Development\Anaconda\envs\DeepLearning\lib\site-packages\torch\nn\modules\conv.py", line 443, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor

And the details of the second error are as follows:

C:\Home\Development\Anaconda\envs\DeepLearning\python.exe D:/ProgrammingCode/PythonPractice/Draft/main.py
Traceback (most recent call last):
  File "D:\ProgrammingCode\PythonPractice\Draft\main.py", line 15, in <module>
    torchlm.runtime.forward(image)
  File "C:\Home\Development\Anaconda\envs\DeepLearning\lib\site-packages\torchlm\runtime\_wrappers.py", line 120, in forward
    return RuntimeWrapper.forward(
  File "C:\Home\Development\Anaconda\envs\DeepLearning\lib\site-packages\torchlm\runtime\_wrappers.py", line 50, in forward
    bboxes = cls.face_base.apply_detecting(image, **kwargs)  # (n,5) x1,y1,x2,y2,score
  File "C:\Home\Development\Anaconda\envs\DeepLearning\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "C:\Home\Development\Anaconda\envs\DeepLearning\lib\site-packages\torchlm\tools\_faceboxesv2.py", line 305, in apply_detecting
    image_scale = cv2.resize(
cv2.error: OpenCV(4.5.5) :-1: error: (-5:Bad argument) in function 'resize'
> Overload resolution failed:
>  - src is not a numpy array, neither a scalar
>  - Expected Ptr<cv::UMat> for argument 'src'

no need image = torch.tensor(image).cuda(), just comment out this line

# image = torch.tensor(image).cuda()

After I comment out this line, I will get first error

i will check it later

应该是pipnet的问题

最后一行报的错:torchlm.runtime.forward(image)

第二个错误是因为torchlm里面有一个cv2.resize的操作,这个操作要求传递numpy数组,而不能是tensor数组。虽然两者都是数组,但是在某些方面是不同的,比如ndarry可以很方便的实现数组反转:a[::-1],但tensor却不支持这样的操作。更具体的差异我也不清楚。

如果我想使用gpu对pipnet进行加速,在torchlm中有没有其他方式可以达到这个目的呢

是因为pipnet推理的时候有处逻辑没有做device判断,在pipnet/_impls.py 的 detecting_impl 中 加上device的判断应该去就可以了:

     net.eval()
     device = next(net.parameters()).device  # 加上这句

    height, width, _ = image.shape
    image: np.ndarray = cv2.resize(image, (net.input_size, net.input_size))  # 256, 256
    image: Tensor = torch.from_numpy(_normalize(img=image)).contiguous().unsqueeze(0)  # (1,3,256,256)
    outputs_cls, outputs_x, outputs_y, outputs_nb_x, outputs_nb_y = net.forward(image.to(device))  # 转换到对应device再推理
    # (1,68,8,8)

可以安装新版本试试

pip install torchlm>=0.1.6.10 # or install the latest pypi version `pip install torchlm`
pip install torchlm>=0.1.6.10 -i https://pypi.org/simple/ # or install from specific pypi mirrors use '-i'

可以正常运行了,不过我发现一个奇怪的现象:我使用一百张图像进行测试,并且使用gpu进行加速,但是所花费的时间和使用cpu的相近,在资源管理器中GPU的利用率也始终保持为0。

于是我进行了调试,发现device = next(net.parameters()).device的结果为cpu,而不是我在外部设置的cuda,或许device = next(net.parameters()).device需要改成其他方式。

是我的torch安装错误了,安装cuda版的pytorch后可以正常使用。