alexmojaki/snoop

Somehow wrapping my pytorch tensor in pp breaks it when training on gpu

Closed this issue · 6 comments

I added one line similar to this: pp(my_tensor) to view the contents of my pytorch tensor when on gpu but it returns the following error:

terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: initialization error
Exception raised from insert_events at /opt/conda/conda-bld/pytorch_1607369981906/work/c10/cuda/CUDACachingAllocator.cpp:717 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f11b91498b2 in /home/nxingyu2/miniconda3/envs/NLP/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x1070 (0x7f11b939bf20 in /home/nxingyu2/miniconda3/envs/NLP/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f11b9134b7d in /home/nxingyu2/miniconda3/envs/NLP/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #3: <unknown function> + 0x5f9e52 (0x7f11fa901e52 in /home/nxingyu2/miniconda3/envs/NLP/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>

Not sure if this is something to post here, but would like to raise it in case anyone else face a similar issue

Interesting. I don't know how pytorch works. Did you use pp inside a function that is called by C++ or the GPU?

Yes I used it in one of the callback functions in pytorch lightning which is called by the GPU.
Actually I believe this isn't a problem with the snoop or pytorch, it works when using CPU, and when I tried the same thing using the gruns/icecream library it also gives the same error, so I suppose the way pytorch optimises the training for GPU prevents it from working. So shall I close this issue?

icecream uses the same underlying library written by me, so I'm equally responsible for that. I'm just wondering if it's impossible to access source code in these circumstances, or something else is broken. What else is different within these function calls? Can you read files? Can you use with snoop? Do other exceptions also lead to such crpytic errors? What if you do this?

import traceback

def foo():
    try:
        pp(...)
    except:
        traceback.print_exc()

@zasdfgbnm do you know anything about this?

I don't know, it looks more like a pytorch problem. Do you see the same error on latest version of pytorch?

Oh I upgraded my pytorch-lightning from 1.1.2 to 1.1.5 and the problem was fixed. Thanks!