Felix-Petersen/difflogic

Support Problem:RuntimeError

Closed this issue · 8 comments

When I meet the following requirement:
:~/difflogic/experiments$ pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 -f https://download.pytorch.org/whl/torch_stable.html # CUDA version 11.3
Looking in links: https://download.pytorch.org/whl/torch_stable.html
Requirement already satisfied: torch==1.12.1+cu113 in /home/anaconda3/envs/logicnet/lib/python3.7/site-packages (1.12.1+cu113)
Requirement already satisfied: torchvision==0.13.1+cu113 in /home/anaconda3/envs/logicnet/lib/python3.7/site-packages (0.13.1+cu113)
Requirement already satisfied: typing-extensions in /home/anaconda3/envs/logicnet/lib/python3.7/site-packages (from torch==1.12.1+cu113) (4.3.0)
Requirement already satisfied: requests in /home/anaconda3/envs/logicnet/lib/python3.7/site-packages (from torchvision==0.13.1+cu113) (2.28.1)
Requirement already satisfied: numpy in /home/anaconda3/envs/logicnet/lib/python3.7/site-packages (from torchvision==0.13.1+cu113) (1.21.5)
Requirement already satisfied: pillow!=8.3.,>=5.3.0 in /home/anaconda3/envs/logicnet/lib/python3.7/site-packages (from torchvision==0.13.1+cu113) (9.4.0)
Requirement already satisfied: charset-normalizer<3,>=2 in /home/anaconda3/envs/logicnet/lib/python3.7/site-packages (from requests->torchvision==0.13.1+cu113) (2.0.4)
Requirement already satisfied: certifi>=2017.4.17 in /home/anaconda3/envs/logicnet/lib/python3.7/site-packages (from requests->torchvision==0.13.1+cu113) (2022.12.7)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/anaconda3/envs/logicnet/lib/python3.7/site-packages (from requests->torchvision==0.13.1+cu113) (1.26.14)
Requirement already satisfied: idna<4,>=2.5 in /home/anaconda3/envs/logicnet/lib/python3.7/site-packages (from requests->torchvision==0.13.1+cu113) (3.4)

But I encounter the following errors when I run the main.py:
:~/difflogic/experiments$ python main.py --dataset mnist -k 500 -l 4
{'experiment_id': None, 'dataset': 'mnist', 'tau': 10, 'seed': 0, 'batch_size': 128, 'learning_rate': 0.01, 'training_bit_count': 32, 'implementation': 'cuda', 'packbits_eval': False, 'compile_model': False, 'num_iterations': 100000, 'eval_freq': 2000, 'valid_set_size': 0.0, 'extensive_eval': False, 'connections': 'unique', 'architecture': 'randomly_connected', 'num_neurons': 500, 'num_layers': 4, 'grad_factor': 1.0}
total_num_neurons=1500
total_num_weights=1500
Sequential(
(0): Flatten(start_dim=1, end_dim=-1)
(1): LogicLayer(784, 500, train)
(2): LogicLayer(500, 500, train)
(3): LogicLayer(500, 500, train)
(4): LogicLayer(500, 500, train)
(5): GroupSum(k=10, tau=10)
)
iteration: 0%| | 0/100000 [00:00<?, ?it/s]
Traceback (most recent call last):
File "main.py", line 288, in
loss = train(model, x, y, loss_fn, optim)
File "main.py", line 175, in train
x = model(x)
File "/home/anaconda3/envs/logicnet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/anaconda3/envs/logicnet/lib/python3.7/site-packages/torch/nn/modules/container.py", line 139, in forward
input = module(input)
File "/home/anaconda3/envs/logicnet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(input, **kwargs)
File "/home/anaconda3/envs/logicnet/lib/python3.7/site-packages/difflogic/difflogic.py", line 88, in forward
return self.forward_cuda(x)
File "/home/anaconda3/envs/logicnet/lib/python3.7/site-packages/difflogic/difflogic.py", line 125, in forward_cuda
x, a, b, w, self.given_x_indices_of_y_start, self.given_x_indices_of_y
File "/home/anaconda3/envs/logicnet/lib/python3.7/site-packages/difflogic/difflogic.py", line 209, in forward
return difflogic_cuda.forward(x, a, b, w)
RuntimeError: Unrecognized tensor type ID: PythonTLSSnapshot

Unfortunately, I cannot reproduce the error.

To me, it looks like difflogic may be only partially installed, i.e., that the CUDA part of difflogic was not installed.
Could you give me some additional information on your setup, paste the output of installing difflogic in a fresh environment, pip freeze, as well as the output of nvcc -v?

Regarding, RuntimeError: Unrecognized tensor type ID: PythonTLSSnapshot, according to https://discuss.pytorch.org/t/encountering-error-unrecognized-tensor-type-id-autogradcuda/102379 it might be related to difflogic being installed with a different version of torch / CUDA compared to the version you are using for inference, e.g., due to upgrading the system after initially installing difflogic.

xx@xxx:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver

Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89

And I use pip freeze > requirement.txt which is put in the attachment.
requirement.txt

Finally, thank u~

Hi, according to the requirements.txt, it looks like the python env might be corrupted as it does not contain torch.

Further, your CUDA version (10.2) does not match the pytorch version that your initial message indicated (11.3), which is a necessary requirement for CUDA packages building on pytorch and typically strongly recommended in general.

I would suggest making a new virtual environment for pytorch and difflogic. In particular, I typically use the following (with an adjustment to account for your CUDA version):

virtualenv -p python3 .env_difflogic
. . env_difflogic/bin/activate  # activates the environment and needs to be run in every new terminal to activate the environment

pip install tqdm matplotlib scipy
pip install torch==1.12.1+cu102 torchvision==0.13.1+cu102 -f https://download.pytorch.org/whl/torch_stable.html # CUDA version 10.2
pip install difflogic

Please let me know whether this works for you.

sir,there have been some issues bothering me here; I installed the environment as per your instructions.

(logicnet) xx@xx : ~/difflogic/experiments$ pip freeze
brotlipy==0.7.0
certifi @ file:///croot/certifi_1671487769961/work/certifi
cffi @ file:///croot/cffi_1670423208954/work
charset-normalizer @ file:///tmp/build/80754af9/charset-normalizer_1630003229654/work
cryptography @ file:///croot/cryptography_1677533068310/work
cycler==0.11.0
difflogic==0.1.0
fonttools==4.38.0
idna @ file:///croot/idna_1666125576474/work
joblib==1.3.2
kiwisolver==1.4.5
matplotlib==3.5.3
mkl-fft==1.3.1
mkl-random @ file:///tmp/build/80754af9/mkl_random_1626179032232/work
mkl-service==2.4.0
numpy @ file:///opt/conda/conda-bld/numpy_and_numpy_base_1653915516269/work
packaging==23.2
Pillow==9.4.0
pycparser @ file:///tmp/build/80754af9/pycparser_1636541352034/work
pyOpenSSL @ file:///croot/pyopenssl_1677607685877/work
pyparsing==3.1.1
PySocks @ file:///tmp/build/80754af9/pysocks_1594394576006/work
python-dateutil==2.8.2
requests @ file:///opt/conda/conda-bld/requests_1657734628632/work
scikit-learn==1.0.2
scipy==1.7.3
six @ file:///tmp/build/80754af9/six_1644875935023/work
threadpoolctl==3.1.0
torch==1.12.1+cu102
torchaudio==0.12.1
torchvision==0.13.1+cu102
tqdm==4.66.1
typing_extensions @ file:///tmp/abs_ben9emwtky/croots/recipe/typing_extensions_1659638822008/work
urllib3 @ file:///croot/urllib3_1673575502006/work

(logicnet) xxx@xxx:~/xxx/difflogic/experiments$ python main.py --dataset mnist -k 500 -l 4
{'experiment_id': None, 'dataset': 'mnist', 'tau': 10, 'seed': 0, 'batch_size': 128, 'learning_rate': 0.01, 'training_bit_count': 32, 'implementation': 'cuda', 'packbits_eval': False, 'compile_model': False, 'num_iterations': 100000, 'eval_freq': 2000, 'valid_set_size': 0.0, 'extensive_eval': False, 'connections': 'unique', 'architecture': 'randomly_connected', 'num_neurons': 500, 'num_layers': 4, 'grad_factor': 1.0}
total_num_neurons=1500
total_num_weights=1500
Sequential(
(0): Flatten(start_dim=1, end_dim=-1)
(1): LogicLayer(784, 500, train)
(2): LogicLayer(500, 500, train)
(3): LogicLayer(500, 500, train)
(4): LogicLayer(500, 500, train)
(5): GroupSum(k=10, tau=10)
)
iteration: 0%| | 0/100000 [00:00<?, ?it/s]
Traceback (most recent call last):
File "main.py", line 288, in
loss = train(model, x, y, loss_fn, optim)
File "main.py", line 175, in train
x = model(x)
File "/home/xxx/anaconda3/envs/logicnet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/xxx/anaconda3/envs/logicnet/lib/python3.7/site-packages/torch/nn/modules/container.py", line 139, in forward
input = module(input)
File "/home/xxx/anaconda3/envs/logicnet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/xxx/anaconda3/envs/logicnet/lib/python3.7/site-packages/difflogic/difflogic.py", line 88, in forward
return self.forward_cuda(x)
File "/home/xxx/anaconda3/envs/logicnet/lib/python3.7/site-packages/difflogic/difflogic.py", line 125, in forward_cuda
x, a, b, w, self.given_x_indices_of_y_start, self.given_x_indices_of_y
File "/home/xxx/anaconda3/envs/logicnet/lib/python3.7/site-packages/difflogic/difflogic.py", line 209, in forward
return difflogic_cuda.forward(x, a, b, w)
RuntimeError: Unrecognized tensor type ID: PythonTLSSnapshot

Could you show the printout from pip install difflogic (the initial one with the compilation)?

here sir:
(logicnet) xxx@xxx: ~/difflogic/experiments$ pip install difflogic
Requirement already satisfied: difflogic in /home/xxx/anaconda3/envs/logicnet/lib/python3.7/site-packages (0.1.0)
Requirement already satisfied: torch>=1.6.0 in /home/xxx/anaconda3/envs/logicnet/lib/python3.7/site-packages (from difflogic) (1.12.1+cu102)
Requirement already satisfied: numpy in /home/xxx/anaconda3/envs/logicnet/lib/python3.7/site-packages (from difflogic) (1.21.5)
Requirement already satisfied: typing-extensions in /home/xxx/anaconda3/envs/logicnet/lib/python3.7/site-packages (from torch>=1.6.0->difflogic) (4.3.0)

Could you give me the printout from a fresh virtual environment or after uninstalling difflogic? (I need it to look at the full printout of the installation.) The given lines just state that it was already installed.

Please let me know in case you got it to work, so I can close the issue ;) If not, I'm happy to help.