Shape mismatch between primes and subs in NormalizedSddNode.get_mpe()

Question

Shape mismatch between primes and subs in NormalizedSddNode.get_mpe()

Closed this issue a year ago · 17 comments

Hi,

Still working on the same toy problem than in Issues #4 and #5:

from SPL.grids.compute_mpe import CircuitMPE
cmpe = CircuitMPE('ex1.vtree', 'ex1.sdd')

scores = network(inputs) # shape torch.Size([6, 5])

bsz = 6
logprobs = F.logsigmoid(scores)
litweights = [[log1mexp(-p), p] for p in logprobs.unbind(axis=1)]
cmpe.parameterize_ff(litweights)
suggested_path = cmpe.get_mpe_inst(bsz)

But this time I get the error:

RuntimeError                              Traceback (most recent call last)
[c:\Users\arthur.ledaguenel\Documents\Experiences\NeSy\test.ipynb](file:///C:/Users/arthur.ledaguenel/Documents/Experiences/NeSy/test.ipynb) Cell 32 in ()
----> [1](vscode-notebook-cell:/c%3A/Users/arthur.ledaguenel/Documents/Experiences/NeSy/test.ipynb#X43sZmlsZQ%3D%3D?line=0) suggested_path = cmpe.get_mpe_inst(bsz)

File [c:\Users\arthur.ledaguenel\Documents\Experiences\NeSy\SPL\grids\compute_mpe.py:62](file:///C:/Users/arthur.ledaguenel/Documents/Experiences/NeSy/SPL/grids/compute_mpe.py:62), in CircuitMPE.get_mpe_inst(self, batch_size)
     61 def get_mpe_inst(self, batch_size):
---> 62     mpe_inst = self.beta.get_mpe(batch_size)
     63     argmax = self.beta.mixing.argmax(dim=-1)
     64     return mpe_inst[torch.arange(batch_size), :, argmax]

File [c:\Users\arthur.ledaguenel\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\_contextlib.py:115](file:///C:/Users/arthur.ledaguenel/AppData/Local/Programs/Python/Python310/lib/site-packages/torch/utils/_contextlib.py:115), in context_decorator..decorate_context(*args, **kwargs)
    112 @functools.wraps(func)
    113 def decorate_context(*args, **kwargs):
    114     with ctx_factory():
--> 115         return func(*args, **kwargs)

File [c:\Users\arthur.ledaguenel\Documents\Experiences\NeSy\./SPL/grids/pypsdd\pypsdd\sdd.py:699](file:///C:/Users/arthur.ledaguenel/Documents/Experiences/NeSy/SPL/grids/pypsdd/pypsdd/sdd.py:699), in NormalizedSddNode.get_mpe(self, batch_size, clear_data)
    697 print("p shape : ", node.positive_elements[0][0].data.shape)
    698 print("s shape : ", node.positive_elements[0][1].data.shape)
--> 699 data = torch.stack([torch.cat((p.data, s.data), dim=-2) for p, s in node.positive_elements])
    700 #data = torch.tensor([], device=DEVICE, dtype=tint)
    701 #for p, s in node.positive_elements:
    702 #    a = torch.cat((p.data, s.data), dim=-2).unsqueeze(dim=0)
    703 #    data = torch.cat((data, a), dim=0)
...
    702 #    a = torch.cat((p.data, s.data), dim=-2).unsqueeze(dim=0)
    703 #    data = torch.cat((data, a), dim=0)
    705 max_branch = max_branch.unsqueeze(dim=-2).expand((1, *data.shape[1:]))

RuntimeError: Tensors must have same number of dimensions: got 3 and 2

Any clue why some elements of the node might data tensors of different shapes ? That might have something to do with something similar as in Issue #4.

Best regards,
Arthur Ledaguenel

Answer 1 · 2023-08-31T14:45:48.000Z

After further examination it seems that the node that mismatches is systematically a sub node of type TRUE that has:
s.data.shape: torch.Size([1, bsz])

I'm not sure why TRUE sub nodes are like this instead of the shape torch.Size([bsz, m, n]) of other nodes.

Answer 2 · 2023-08-31T18:58:20.000Z

Hi Arthur,

I think this should be an easy fix. Could you just provide the code to obtain ex1.sdd and ex1.vtree as well as the network definition? That way I could run the above as a MWE, which would make my life considerably easier.

Answer 3 · 2023-09-01T07:23:21.000Z

ex1.vtree file:

c ids of vtree nodes start at 0
c ids of variables start at 1
c vtree nodes appear bottom-up, children before parents
c
c file syntax:
c vtree number-of-nodes-in-vtree
c L id-of-leaf-vtree-node id-of-variable
c I id-of-internal-vtree-node id-of-left-child id-of-right-child
c
vtree 9
L 0 2
L 2 3
L 4 1
L 6 4
L 8 5
I 7 6 8
I 5 4 7
I 3 2 5
I 1 0 3

Answer 4 · 2023-09-01T07:23:56.000Z

ex1.sdd file:

c ids of sdd nodes start at 0
c sdd nodes appear bottom-up, children before parents
c
c file syntax:
c sdd count-of-sdd-nodes
c F id-of-false-sdd-node
c T id-of-true-sdd-node
c L id-of-literal-sdd-node id-of-vtree literal
c D id-of-decomposition-sdd-node id-of-vtree number-of-elements {id-of-prime id-of-sub}*
c
sdd 18
L 1 0 -2
L 3 2 3
L 5 4 1
L 7 6 -4
L 8 8 -5
L 9 6 4
F 10
D 6 7 2 7 8 9 10
L 11 4 -1
D 4 5 2 5 6 11 10
L 12 2 -3
D 2 3 2 3 4 12 6
L 13 0 2
T 17
D 16 7 2 9 8 7 17
D 15 5 2 5 16 11 10
D 14 3 2 12 15 3 10
D 0 1 2 1 2 13 14

Answer 5 · 2023-09-01T07:24:49.000Z

Do you need to know how they were compiled ?

Answer 6 · 2023-09-01T07:30:15.000Z

For the network, I'm just using random scores for debugging purpuses right now, for instance:
scores = torch.rand((6, 5)).mul(3).sub(1)

Answer 7 · 2023-09-01T07:50:32.000Z

In sdd.py in function definition get_mpe(self, batch_size, clear_data=True) there is a mismatch when creating data for TRUE nodes and LITERAL nodes:

elif node.is_true():
                data = torch.where(node.theta.argmax(dim=-2) > 0, torch.tensor(node.vtree.var, device=DEVICE, dtype=tint), torch.tensor(-node.vtree.var, device=DEVICE, dtype=tint))
                data = data.unsqueeze(dim=-2)
                #assert(len(data.shape) == 3 and data.shape[0] == len(node.theta) and data.shape[1] == 1)

elif node.is_literal():
                data = torch.tensor([node.literal], device=DEVICE, dtype=tint).unsqueeze(0).unsqueeze(-1).expand(batch_size, 1, self.num_reps)

This gives shape torch.Size([1, 6]) for TRUE nodes and shape torch.Size([6, 1, 1]) for LITERAL nodes, which then means they cannot be properly stacked in DECOMPOSITION nodes at:
data = torch.stack([torch.cat((p.data, s.data), dim=-2) for p, s in node.positive_elements])

Answer 8 · 2023-09-04T08:04:28.000Z

I managed to make it work by changing sdd.py :
l.666 data = data.unsqueeze(dim=-2) to data = data.unsqueeze(-1).unsqueeze(-1)
l.695 max_branch = node.theta.argmax(dim=0, keepdim=True) to max_branch = node.theta.argmax(dim=0, keepdim=True).unsqueeze(-1).expand(1, batch_size, self.num_reps)

I also added:

cmpe.beta.num_reps = 1
cmpe.beta.mixing = torch.tensor([1])

This runs without errors but I'm not sure if it changes the results of the system or not.

Concerning my toy example, if I set:
scores=torch.tensor([[True, True, False, False, False]]).to(torch.float).mul(3).sub(2)
I'm expected to get suggested_path.gt(0) == scores.gt(0) since the base boolean vector is consistent with the circuit.
However, I get: suggested_path.gt(0) == tensor([[ True, True, False, True, False]]), another boolean vector consistent with the circuit but not the most likely one given the synthetic logits I created.

Is that a normal behavior ?

Answer 9 · 2023-09-04T20:35:29.000Z

Hi Arthur,

There was a simple mismatch in the shape of the data created when the circuit contained True nodes. I pushed a fix that should take care of that issue.

It now works on my end with the following driver code (adapted from warcraft_shortest_path/trainers.py):

output = torch.rand(6, 6, 1, device='cuda')
output, cmpe.beta.mixing = output.split((5, 1), dim=1)
logprobs = F.logsigmoid(output).clamp(max=-1e-7)
cmpe.beta.mixing = cmpe.beta.mixing.squeeze(1).log_softmax(dim=1)

# Parameterize the circuit
litweights = [[log1mexp(-p), p] for p in logprobs.unbind(axis=1)]

bsz = 6

# parameterize circuit
cmpe.parameterize_ff(litweights)

# Get mpe
suggested_path = mpe(bsz)

I'm working on a more streamlined version of the code, so stay tuned :)

Regarding your second question, I'm not quite sure I get what you mean. Why are you multiplying and subtracting across dimensions? I'm also not sure what you expect to get by comparing the mpe with the scores? This does not check for consistency with the circuit.

Answer 10 · 2023-09-05T08:26:25.000Z

Hi,
Thanks a lot for update !

Regarding the second question: scores=torch.tensor([[True, True, False, False, False]]).to(torch.float).mul(3).sub(2) enables to create a score vector based on a valid configuration that has positive values for True variabels and negative values for False variables. When turned into logits, this will imply p>0.5 for True variabels and p<0.5 for False variables. This ensures that the most probabable configuration given these logits is the base configuration ([True, True, False, False, False] in this case).

Since the most probabable configuration given the logits is consistent with the logical formula compiled into the circuit, shouldn't it be the predicted output of the SPL ?

Maybe there is something I misunderstood about SPL but according to my understanding during inference if predicts the most probable output (given the logits) that is also consistent with the logical formula.

Answer 11 · 2023-09-05T08:35:10.000Z

I also must say I am unsure about the role of the mixing component when num_reps = 1, which is why I set it up like this in my script:
cmpe.beta.mixing = torch.tensor([1])

How would that affect the behavior of the layer ?

Answer 12 · 2023-09-05T09:03:58.000Z

I tried with the updated code and still encounter an issue:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
[c:\Users\arthur.ledaguenel\Documents\Experiences\NeSy\test.ipynb](file:///C:/Users/arthur.ledaguenel/Documents/Experiences/NeSy/test.ipynb) Cell 25 in ()
      [2](vscode-notebook-cell:/c%3A/Users/arthur.ledaguenel/Documents/Experiences/NeSy/test.ipynb#X34sZmlsZQ%3D%3D?line=1) litweights = [[1-p, p] for p in probs.unbind(axis=-1)]
      [3](vscode-notebook-cell:/c%3A/Users/arthur.ledaguenel/Documents/Experiences/NeSy/test.ipynb#X34sZmlsZQ%3D%3D?line=2) # cmpe.parameterize_ff(litweights)
----> [4](vscode-notebook-cell:/c%3A/Users/arthur.ledaguenel/Documents/Experiences/NeSy/test.ipynb#X34sZmlsZQ%3D%3D?line=3) cmpe.beta.parameterize_ff(litweights, clear_data=False)

File [c:\Users\arthur.ledaguenel\Documents\Experiences\NeSy\./SPL/grids/pypsdd\pypsdd\sdd.py:810](file:///C:/Users/arthur.ledaguenel/Documents/Experiences/NeSy/SPL/grids/pypsdd/pypsdd/sdd.py:810), in NormalizedSddNode.parameterize_ff(self, litleaves, clear_data)
    808 primes, subs = zip(*node.positive_elements)
    809 primes = torch.stack([p.data for p in primes])
--> 810 subs = torch.stack([s.data for s in subs])
    811 node.theta = primes + subs
    812 data = node.theta.logsumexp(dim=0)

RuntimeError: stack expects each tensor to be equal size, but got [6] at entry 0 and [6, 1] at entry 1

Answer 13 · 2023-09-05T09:19:48.000Z

It seems that the update on parameterize_ff:
- data = torch.zeros(bsz, device=DEVICE)
+ data = torch.zeros((bsz, self.num_reps), device=DEVICE)
might be responsible

Answer 14 · 2023-09-05T21:17:02.000Z

Regarding your second questions again, I'm still not sure what the purpose of the .mul(3).sub(2) is. I'm also not clear how you're computing the MPE, since you never call get_mpe? I believe this is the test you're looking for:

logprobs[:, 0:1, :] = 0
logprobs[:, 2:4, :] = -float('inf')
litweights = [[log1mexp(-p), p] for p in logprobs.unbind(axis=1)]
cmpe.parameterize_ff(litweights)
map_state = mpe(bsz)

assert(suggested_path == logprobs.squeeze().exp().long()).all()

I also must say I am unsure about the role of the mixing component when num_reps = 1

It's just for uniformity so the code scales easily to arbitrary large mixtures.

I tried with the updated code and still encounter an issue:

Sorry, it appears I missed an important line in my example, initializing the gating function which sets num_reps for the SPL. I'll push code that takes setting that out of the gating function. But for now, here is a MWE that seems to be working on my end

import os
import sys
sys.path.append(os.path.join(sys.path[0], '..' ,'grids'))
sys.path.append(os.path.join(sys.path[0], '..' ,'grids', 'pypsdd'))

import torch
import torch.nn.functional as F

from compute_mpe import CircuitMPE
from GatingFunction import DenseGatingFunction

def log1mexp(x):
        #assert(torch.all(x >= 0))
        return torch.where(x < 0.6931471805599453094, torch.log(-torch.expm1(-x)), torch.log1p(-torch.exp(-x)))

def mpe(bsz):
    mpe = cmpe.get_mpe_inst(bsz)
    return (mpe > 0).long()

cmpe = CircuitMPE(f'ex1.vtree', f'ex1.sdd')
gate = DenseGatingFunction(cmpe.beta, gate_layers=[256] + [256]*2, num_reps=1).cuda()

output = torch.rand(6, 6, 1, device='cuda')
output, cmpe.beta.mixing = output.split((5, 1), dim=1)
logprobs = F.logsigmoid(output).clamp(max=-1e-7)
cmpe.beta.mixing = cmpe.beta.mixing.squeeze(1).log_softmax(dim=1)

litweights = [[log1mexp(-p), p] for p in logprobs.unbind(axis=1)]
cmpe.parameterize_ff(litweights)
suggested_path = mpe(bsz=6)

Answer 15 · 2023-09-06T06:53:26.000Z

Hi,

Regarding your second questions again, I'm still not sure what the purpose of the .mul(3).sub(2) is.

The purpose of .to(torch.float).mul(3).sub(2) is to turn the boolean vector into a vector of scores with stricly positive values for True variables and strictly negative values for False variables. In this case scores=torch.tensor([[True, True, False, False, False]]).to(torch.float).mul(3).sub(2) leads to scores=torch.tensor([[1, 1, -2, -2, -2]]). The values 3 and 2 are not that important and can be changed (.mul(2).sub(1) works just as well for that purpose).

I'm also not clear how you're computing the MPE

The MPE is computed by calling cmpe.get_mpe_inst(bsz) like in the first example given, which internally calls get_mpe right ?

I believe this is the test you're looking for:

In your example with logprobs[:, 0:1, :] = 0 I would assume that both [True, True, False, False, False] and [False, False, False, False, False] are as likely ? This is why I ensure strictly positive scores for the True variables to get a unique MPE.

Sorry, it appears I missed an important line in my example, initializing the gating function which sets num_reps for the SPL.

For the num_reps I added cmpe.beta.num_reps = 1 like mentioned before because I don't want to have additional parameters in a gating function, which is the case with the DenseGatingFunction (similar to a dense layer in a neural network) if I understood correctly ?

Answer 16 · 2023-09-06T15:38:54.000Z

In your example with logprobs[:, 0:1, :] = 0 I would assume that both [True, True, False, False, False] and [False, False, False, False, False] are as likely ?

Not really. We're working in log-space, so the above is equivalent to setting the first two variables to True.

For the num_reps I added cmpe.beta.num_reps = 1

That also works. Although initializing the gating function doesn't really affect the output unless you invoke it explicitly.

Answer 17 · 2023-09-07T12:07:21.000Z

Hi,

We're working in log-space, so the above is equivalent to setting the first two variables to True

Ok, I did not realize that litweights where in the logspace by default. I believe this is not the case for get_torch_ac() function of CircuitMPE for instance, am I right ? (I found consistent results with litweights as probabilities by default and not logprobs).

Although initializing the gating function doesn't really affect the output unless you invoke it explicitly.

That's good to know. In your example the gating function is only used to initialize the num_reps attribute right ?