Can't use cuda in pipnet
try-agaaain opened this issue · 15 comments
I want use cuda in pipnet, so I run the following code:
import torchlm
from torchlm.tools import faceboxesv2
from torchlm.models import pipnet
import cv2
image_path = '../rgb/image0/1.png'
image = cv2.imread(image_path)
torchlm.runtime.bind(faceboxesv2())
torchlm.runtime.bind(
pipnet(backbone="resnet18", pretrained=True,
num_nb=10, num_lms=98, net_stride=32, input_size=256,
meanface_type="wflw", checkpoint=None, map_location="cuda")
)
torchlm.runtime.forward(image)
Then I get a error that say:
RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor
I know it due to my image data is still stay in cpu instead of gpu, and I need load my data to gpu. So I add a line code like following:
image = cv2.imread(image_path)
image = torch.tensor(image).cuda()
But now I get another error:
File "C:\Home\Development\Anaconda\envs\DeepLearning\lib\site-packages\torchlm\tools\_faceboxesv2.py", line 305, in apply_detecting
image_scale = cv2.resize(
cv2.error: OpenCV(4.5.5) :-1: error: (-5:Bad argument) in function 'resize'
> Overload resolution failed:
> - src is not a numpy array, neither a scalar
> - Expected Ptr<cv::UMat> for argument 'src'
It means I need pass a ndarray array instead of a torch tensor. But if I pass a ndarray, its data will stay in cpu, and I will get the first error again.
What shold I do? Hava anyone get the same error?
try to also bind faceboxesv2 with param device="cuda"
.
torchlm.runtime.bind(faceboxesv2(device="cuda"))
Without success, and the problem has not changed
can you show me the details of the error?
The details of the first error are as follows:
C:\Home\Development\Anaconda\envs\DeepLearning\python.exe D:/ProgrammingCode/PythonPractice/Draft/main.py
Traceback (most recent call last):
File "D:\ProgrammingCode\PythonPractice\Draft\main.py", line 13, in <module>
torchlm.runtime.forward(image)
File "C:\Home\Development\Anaconda\envs\DeepLearning\lib\site-packages\torchlm\runtime\_wrappers.py", line 120, in forward
return RuntimeWrapper.forward(
File "C:\Home\Development\Anaconda\envs\DeepLearning\lib\site-packages\torchlm\runtime\_wrappers.py", line 79, in forward
lms_pred = cls.landmarks_base.apply_detecting(crop, **kwargs) # (m,2)
File "C:\Home\Development\Anaconda\envs\DeepLearning\lib\site-packages\torchlm\models\pipnet\_impls.py", line 145, in apply_detecting
return _detecting_impl(net=self, image=image)
File "C:\Home\Development\Anaconda\envs\DeepLearning\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "C:\Home\Development\Anaconda\envs\DeepLearning\lib\site-packages\torchlm\models\pipnet\_impls.py", line 292, in _detecting_impl
outputs_cls, outputs_x, outputs_y, outputs_nb_x, outputs_nb_y = net.forward(image)
File "C:\Home\Development\Anaconda\envs\DeepLearning\lib\site-packages\torchlm\models\pipnet\pipnet.py", line 265, in forward
return self._forward_impl(x)
File "C:\Home\Development\Anaconda\envs\DeepLearning\lib\site-packages\torchlm\models\pipnet\pipnet.py", line 248, in _forward_impl
x = self.conv1(x)
File "C:\Home\Development\Anaconda\envs\DeepLearning\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Home\Development\Anaconda\envs\DeepLearning\lib\site-packages\torch\nn\modules\conv.py", line 447, in forward
return self._conv_forward(input, self.weight, self.bias)
File "C:\Home\Development\Anaconda\envs\DeepLearning\lib\site-packages\torch\nn\modules\conv.py", line 443, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor
And the details of the second error are as follows:
C:\Home\Development\Anaconda\envs\DeepLearning\python.exe D:/ProgrammingCode/PythonPractice/Draft/main.py
Traceback (most recent call last):
File "D:\ProgrammingCode\PythonPractice\Draft\main.py", line 15, in <module>
torchlm.runtime.forward(image)
File "C:\Home\Development\Anaconda\envs\DeepLearning\lib\site-packages\torchlm\runtime\_wrappers.py", line 120, in forward
return RuntimeWrapper.forward(
File "C:\Home\Development\Anaconda\envs\DeepLearning\lib\site-packages\torchlm\runtime\_wrappers.py", line 50, in forward
bboxes = cls.face_base.apply_detecting(image, **kwargs) # (n,5) x1,y1,x2,y2,score
File "C:\Home\Development\Anaconda\envs\DeepLearning\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "C:\Home\Development\Anaconda\envs\DeepLearning\lib\site-packages\torchlm\tools\_faceboxesv2.py", line 305, in apply_detecting
image_scale = cv2.resize(
cv2.error: OpenCV(4.5.5) :-1: error: (-5:Bad argument) in function 'resize'
> Overload resolution failed:
> - src is not a numpy array, neither a scalar
> - Expected Ptr<cv::UMat> for argument 'src'
no need image = torch.tensor(image).cuda()
, just comment out this line
# image = torch.tensor(image).cuda()
After I comment out this line, I will get first error
i will check it later
应该是pipnet的问题
最后一行报的错:torchlm.runtime.forward(image)
第二个错误是因为torchlm里面有一个cv2.resize的操作,这个操作要求传递numpy数组,而不能是tensor数组。虽然两者都是数组,但是在某些方面是不同的,比如ndarry可以很方便的实现数组反转:a[::-1],但tensor却不支持这样的操作。更具体的差异我也不清楚。
如果我想使用gpu对pipnet进行加速,在torchlm中有没有其他方式可以达到这个目的呢
是因为pipnet推理的时候有处逻辑没有做device判断,在pipnet/_impls.py 的 detecting_impl 中 加上device的判断应该去就可以了:
net.eval()
device = next(net.parameters()).device # 加上这句
height, width, _ = image.shape
image: np.ndarray = cv2.resize(image, (net.input_size, net.input_size)) # 256, 256
image: Tensor = torch.from_numpy(_normalize(img=image)).contiguous().unsqueeze(0) # (1,3,256,256)
outputs_cls, outputs_x, outputs_y, outputs_nb_x, outputs_nb_y = net.forward(image.to(device)) # 转换到对应device再推理
# (1,68,8,8)
可以安装新版本试试
pip install torchlm>=0.1.6.10 # or install the latest pypi version `pip install torchlm`
pip install torchlm>=0.1.6.10 -i https://pypi.org/simple/ # or install from specific pypi mirrors use '-i'
可以正常运行了,不过我发现一个奇怪的现象:我使用一百张图像进行测试,并且使用gpu进行加速,但是所花费的时间和使用cpu的相近,在资源管理器中GPU的利用率也始终保持为0。
于是我进行了调试,发现device = next(net.parameters()).device
的结果为cpu,而不是我在外部设置的cuda,或许device = next(net.parameters()).device
需要改成其他方式。
是我的torch安装错误了,安装cuda版的pytorch后可以正常使用。