suriyadeepan/torchtest

CUDA pointer being compared to CPU pointer

Closed this issue · 2 comments

Got the following error during testing my network at https://github.com/k0pch4/big-little-net/.

torchtest\torchtest.py", line 151, in _var_change_helper
assert not torch.equal(p0, p1)
RuntimeError: Expected object of backend CPU but got backend CUDA for argument #2 'other'

Code to replicate

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable

from src import models
from helper import get_models

# set cpu for avoiding possible errors in torchtest
torch.device("cpu")

# get the model names
model_names = get_models(models)

import torchtest as tt
inputs = torch.rand(2, 3, 224, 224)
targets = torch.FloatTensor(2).uniform_(0, 1000).long()
# torch.randint(0, 2, (2, 1000,))
batch = [inputs, targets]
model = models.bl_resnet50()

# what are the variables?
print('Our list of parameters', [ np[0] for np in model.named_parameters() ])

# do they change after a training step?
#  let's run a train step and see
tt.assert_vars_change(
    model=model, 
    loss_fn=F.cross_entropy, 
    optim=torch.optim.Adam(model.parameters()),
    batch=batch)

Will add more details if required.

--- o	2019-03-29 05:26:17.575465900 +0530
+++ f	2019-03-29 05:26:53.183920300 +0530
@@ -6,18 +6,19 @@
 from src import models
 from helper import get_models
 
-# set cpu for avoiding possible errors in torchtest
-torch.device("cpu")
+# set gpu for avoiding possible errors in torchtest
+dev='cuda:0'
+torch.device(dev)
 
 # get the model names
 model_names = get_models(models)
 
 import torchtest as tt
-inputs = torch.rand(2, 3, 224, 224)
-targets = torch.FloatTensor(2).uniform_(0, 1000).long()
+inputs = torch.rand(2, 3, 224, 224).to(dev)
+targets = torch.FloatTensor(2).uniform_(0, 1000).long().to(dev)
 # torch.randint(0, 2, (2, 1000,))
 batch = [inputs, targets]
-model = models.bl_resnet50()
+model = models.bl_resnet50().to(dev)
 
 # what are the variables?
 print('Our list of parameters', [ np[0] for np in model.named_parameters() ])

After using the above diffs, I was able to test the model.

I think the library is by default considering that I would be using gpu, therefore maybe it ports variables to gpu. I am guessing at this point. I could take this up to look more into this.

a7ad13b fixes this by adding an optional device argument to tests.
Pass device='cuda:0' to tt.assert_vars_change, it should work.