Shape mismatch between primes and subs in NormalizedSddNode.get_mpe()
Closed this issue · 17 comments
Hi,
Still working on the same toy problem than in Issues #4 and #5:
from SPL.grids.compute_mpe import CircuitMPE
cmpe = CircuitMPE('ex1.vtree', 'ex1.sdd')
scores = network(inputs) # shape torch.Size([6, 5])
bsz = 6
logprobs = F.logsigmoid(scores)
litweights = [[log1mexp(-p), p] for p in logprobs.unbind(axis=1)]
cmpe.parameterize_ff(litweights)
suggested_path = cmpe.get_mpe_inst(bsz)
But this time I get the error:
RuntimeError Traceback (most recent call last)
[c:\Users\arthur.ledaguenel\Documents\Experiences\NeSy\test.ipynb](file:///C:/Users/arthur.ledaguenel/Documents/Experiences/NeSy/test.ipynb) Cell 32 in ()
----> [1](vscode-notebook-cell:/c%3A/Users/arthur.ledaguenel/Documents/Experiences/NeSy/test.ipynb#X43sZmlsZQ%3D%3D?line=0) suggested_path = cmpe.get_mpe_inst(bsz)
File [c:\Users\arthur.ledaguenel\Documents\Experiences\NeSy\SPL\grids\compute_mpe.py:62](file:///C:/Users/arthur.ledaguenel/Documents/Experiences/NeSy/SPL/grids/compute_mpe.py:62), in CircuitMPE.get_mpe_inst(self, batch_size)
61 def get_mpe_inst(self, batch_size):
---> 62 mpe_inst = self.beta.get_mpe(batch_size)
63 argmax = self.beta.mixing.argmax(dim=-1)
64 return mpe_inst[torch.arange(batch_size), :, argmax]
File [c:\Users\arthur.ledaguenel\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\_contextlib.py:115](file:///C:/Users/arthur.ledaguenel/AppData/Local/Programs/Python/Python310/lib/site-packages/torch/utils/_contextlib.py:115), in context_decorator..decorate_context(*args, **kwargs)
112 @functools.wraps(func)
113 def decorate_context(*args, **kwargs):
114 with ctx_factory():
--> 115 return func(*args, **kwargs)
File [c:\Users\arthur.ledaguenel\Documents\Experiences\NeSy\./SPL/grids/pypsdd\pypsdd\sdd.py:699](file:///C:/Users/arthur.ledaguenel/Documents/Experiences/NeSy/SPL/grids/pypsdd/pypsdd/sdd.py:699), in NormalizedSddNode.get_mpe(self, batch_size, clear_data)
697 print("p shape : ", node.positive_elements[0][0].data.shape)
698 print("s shape : ", node.positive_elements[0][1].data.shape)
--> 699 data = torch.stack([torch.cat((p.data, s.data), dim=-2) for p, s in node.positive_elements])
700 #data = torch.tensor([], device=DEVICE, dtype=tint)
701 #for p, s in node.positive_elements:
702 # a = torch.cat((p.data, s.data), dim=-2).unsqueeze(dim=0)
703 # data = torch.cat((data, a), dim=0)
...
702 # a = torch.cat((p.data, s.data), dim=-2).unsqueeze(dim=0)
703 # data = torch.cat((data, a), dim=0)
705 max_branch = max_branch.unsqueeze(dim=-2).expand((1, *data.shape[1:]))
RuntimeError: Tensors must have same number of dimensions: got 3 and 2
Any clue why some elements of the node might data tensors of different shapes ? That might have something to do with something similar as in Issue #4.
Best regards,
Arthur Ledaguenel
After further examination it seems that the node that mismatches is systematically a sub node of type TRUE
that has:
s.data.shape: torch.Size([1, bsz])
I'm not sure why TRUE
sub nodes are like this instead of the shape torch.Size([bsz, m, n])
of other nodes.
Hi Arthur,
I think this should be an easy fix. Could you just provide the code to obtain ex1.sdd
and ex1.vtree
as well as the network definition? That way I could run the above as a MWE, which would make my life considerably easier.
ex1.vtree file:
c ids of vtree nodes start at 0
c ids of variables start at 1
c vtree nodes appear bottom-up, children before parents
c
c file syntax:
c vtree number-of-nodes-in-vtree
c L id-of-leaf-vtree-node id-of-variable
c I id-of-internal-vtree-node id-of-left-child id-of-right-child
c
vtree 9
L 0 2
L 2 3
L 4 1
L 6 4
L 8 5
I 7 6 8
I 5 4 7
I 3 2 5
I 1 0 3
ex1.sdd file:
c ids of sdd nodes start at 0
c sdd nodes appear bottom-up, children before parents
c
c file syntax:
c sdd count-of-sdd-nodes
c F id-of-false-sdd-node
c T id-of-true-sdd-node
c L id-of-literal-sdd-node id-of-vtree literal
c D id-of-decomposition-sdd-node id-of-vtree number-of-elements {id-of-prime id-of-sub}*
c
sdd 18
L 1 0 -2
L 3 2 3
L 5 4 1
L 7 6 -4
L 8 8 -5
L 9 6 4
F 10
D 6 7 2 7 8 9 10
L 11 4 -1
D 4 5 2 5 6 11 10
L 12 2 -3
D 2 3 2 3 4 12 6
L 13 0 2
T 17
D 16 7 2 9 8 7 17
D 15 5 2 5 16 11 10
D 14 3 2 12 15 3 10
D 0 1 2 1 2 13 14
Do you need to know how they were compiled ?
For the network, I'm just using random scores for debugging purpuses right now, for instance:
scores = torch.rand((6, 5)).mul(3).sub(1)
In sdd.py in function definition get_mpe(self, batch_size, clear_data=True)
there is a mismatch when creating data for TRUE nodes and LITERAL nodes:
elif node.is_true():
data = torch.where(node.theta.argmax(dim=-2) > 0, torch.tensor(node.vtree.var, device=DEVICE, dtype=tint), torch.tensor(-node.vtree.var, device=DEVICE, dtype=tint))
data = data.unsqueeze(dim=-2)
#assert(len(data.shape) == 3 and data.shape[0] == len(node.theta) and data.shape[1] == 1)
elif node.is_literal():
data = torch.tensor([node.literal], device=DEVICE, dtype=tint).unsqueeze(0).unsqueeze(-1).expand(batch_size, 1, self.num_reps)
This gives shape torch.Size([1, 6])
for TRUE nodes and shape torch.Size([6, 1, 1])
for LITERAL nodes, which then means they cannot be properly stacked in DECOMPOSITION nodes at:
data = torch.stack([torch.cat((p.data, s.data), dim=-2) for p, s in node.positive_elements])
I managed to make it work by changing sdd.py :
l.666 data = data.unsqueeze(dim=-2)
to data = data.unsqueeze(-1).unsqueeze(-1)
l.695 max_branch = node.theta.argmax(dim=0, keepdim=True)
to max_branch = node.theta.argmax(dim=0, keepdim=True).unsqueeze(-1).expand(1, batch_size, self.num_reps)
I also added:
cmpe.beta.num_reps = 1
cmpe.beta.mixing = torch.tensor([1])
This runs without errors but I'm not sure if it changes the results of the system or not.
Concerning my toy example, if I set:
scores=torch.tensor([[True, True, False, False, False]]).to(torch.float).mul(3).sub(2)
I'm expected to get suggested_path.gt(0) == scores.gt(0)
since the base boolean vector is consistent with the circuit.
However, I get: suggested_path.gt(0) == tensor([[ True, True, False, True, False]])
, another boolean vector consistent with the circuit but not the most likely one given the synthetic logits I created.
Is that a normal behavior ?
Hi Arthur,
There was a simple mismatch in the shape of the data created when the circuit contained True nodes. I pushed a fix that should take care of that issue.
It now works on my end with the following driver code (adapted from warcraft_shortest_path/trainers.py
):
output = torch.rand(6, 6, 1, device='cuda')
output, cmpe.beta.mixing = output.split((5, 1), dim=1)
logprobs = F.logsigmoid(output).clamp(max=-1e-7)
cmpe.beta.mixing = cmpe.beta.mixing.squeeze(1).log_softmax(dim=1)
# Parameterize the circuit
litweights = [[log1mexp(-p), p] for p in logprobs.unbind(axis=1)]
bsz = 6
# parameterize circuit
cmpe.parameterize_ff(litweights)
# Get mpe
suggested_path = mpe(bsz)
I'm working on a more streamlined version of the code, so stay tuned :)
Regarding your second question, I'm not quite sure I get what you mean. Why are you multiplying and subtracting across dimensions? I'm also not sure what you expect to get by comparing the mpe with the scores? This does not check for consistency with the circuit.
Hi,
Thanks a lot for update !
Regarding the second question: scores=torch.tensor([[True, True, False, False, False]]).to(torch.float).mul(3).sub(2)
enables to create a score vector based on a valid configuration that has positive values for True variabels and negative values for False variables. When turned into logits, this will imply p>0.5 for True variabels and p<0.5 for False variables. This ensures that the most probabable configuration given these logits is the base configuration ([True, True, False, False, False]
in this case).
Since the most probabable configuration given the logits is consistent with the logical formula compiled into the circuit, shouldn't it be the predicted output of the SPL ?
Maybe there is something I misunderstood about SPL but according to my understanding during inference if predicts the most probable output (given the logits) that is also consistent with the logical formula.
I also must say I am unsure about the role of the mixing component when num_reps = 1
, which is why I set it up like this in my script:
cmpe.beta.mixing = torch.tensor([1])
How would that affect the behavior of the layer ?
I tried with the updated code and still encounter an issue:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
[c:\Users\arthur.ledaguenel\Documents\Experiences\NeSy\test.ipynb](file:///C:/Users/arthur.ledaguenel/Documents/Experiences/NeSy/test.ipynb) Cell 25 in ()
[2](vscode-notebook-cell:/c%3A/Users/arthur.ledaguenel/Documents/Experiences/NeSy/test.ipynb#X34sZmlsZQ%3D%3D?line=1) litweights = [[1-p, p] for p in probs.unbind(axis=-1)]
[3](vscode-notebook-cell:/c%3A/Users/arthur.ledaguenel/Documents/Experiences/NeSy/test.ipynb#X34sZmlsZQ%3D%3D?line=2) # cmpe.parameterize_ff(litweights)
----> [4](vscode-notebook-cell:/c%3A/Users/arthur.ledaguenel/Documents/Experiences/NeSy/test.ipynb#X34sZmlsZQ%3D%3D?line=3) cmpe.beta.parameterize_ff(litweights, clear_data=False)
File [c:\Users\arthur.ledaguenel\Documents\Experiences\NeSy\./SPL/grids/pypsdd\pypsdd\sdd.py:810](file:///C:/Users/arthur.ledaguenel/Documents/Experiences/NeSy/SPL/grids/pypsdd/pypsdd/sdd.py:810), in NormalizedSddNode.parameterize_ff(self, litleaves, clear_data)
808 primes, subs = zip(*node.positive_elements)
809 primes = torch.stack([p.data for p in primes])
--> 810 subs = torch.stack([s.data for s in subs])
811 node.theta = primes + subs
812 data = node.theta.logsumexp(dim=0)
RuntimeError: stack expects each tensor to be equal size, but got [6] at entry 0 and [6, 1] at entry 1
It seems that the update on parameterize_ff
:
- data = torch.zeros(bsz, device=DEVICE)
+ data = torch.zeros((bsz, self.num_reps), device=DEVICE)
might be responsible
Regarding your second questions again, I'm still not sure what the purpose of the .mul(3).sub(2)
is. I'm also not clear how you're computing the MPE, since you never call get_mpe
? I believe this is the test you're looking for:
logprobs[:, 0:1, :] = 0
logprobs[:, 2:4, :] = -float('inf')
litweights = [[log1mexp(-p), p] for p in logprobs.unbind(axis=1)]
cmpe.parameterize_ff(litweights)
map_state = mpe(bsz)
assert(suggested_path == logprobs.squeeze().exp().long()).all()
I also must say I am unsure about the role of the mixing component when num_reps = 1
It's just for uniformity so the code scales easily to arbitrary large mixtures.
I tried with the updated code and still encounter an issue:
Sorry, it appears I missed an important line in my example, initializing the gating function which sets num_reps
for the SPL. I'll push code that takes setting that out of the gating function. But for now, here is a MWE that seems to be working on my end
import os
import sys
sys.path.append(os.path.join(sys.path[0], '..' ,'grids'))
sys.path.append(os.path.join(sys.path[0], '..' ,'grids', 'pypsdd'))
import torch
import torch.nn.functional as F
from compute_mpe import CircuitMPE
from GatingFunction import DenseGatingFunction
def log1mexp(x):
#assert(torch.all(x >= 0))
return torch.where(x < 0.6931471805599453094, torch.log(-torch.expm1(-x)), torch.log1p(-torch.exp(-x)))
def mpe(bsz):
mpe = cmpe.get_mpe_inst(bsz)
return (mpe > 0).long()
cmpe = CircuitMPE(f'ex1.vtree', f'ex1.sdd')
gate = DenseGatingFunction(cmpe.beta, gate_layers=[256] + [256]*2, num_reps=1).cuda()
output = torch.rand(6, 6, 1, device='cuda')
output, cmpe.beta.mixing = output.split((5, 1), dim=1)
logprobs = F.logsigmoid(output).clamp(max=-1e-7)
cmpe.beta.mixing = cmpe.beta.mixing.squeeze(1).log_softmax(dim=1)
litweights = [[log1mexp(-p), p] for p in logprobs.unbind(axis=1)]
cmpe.parameterize_ff(litweights)
suggested_path = mpe(bsz=6)
Hi,
Regarding your second questions again, I'm still not sure what the purpose of the .mul(3).sub(2) is.
The purpose of .to(torch.float).mul(3).sub(2)
is to turn the boolean vector into a vector of scores with stricly positive values for True variables and strictly negative values for False variables. In this case scores=torch.tensor([[True, True, False, False, False]]).to(torch.float).mul(3).sub(2)
leads to scores=torch.tensor([[1, 1, -2, -2, -2]])
. The values 3
and 2
are not that important and can be changed (.mul(2).sub(1)
works just as well for that purpose).
I'm also not clear how you're computing the MPE
The MPE is computed by calling cmpe.get_mpe_inst(bsz)
like in the first example given, which internally calls get_mpe
right ?
I believe this is the test you're looking for:
In your example with logprobs[:, 0:1, :] = 0
I would assume that both [True, True, False, False, False]
and [False, False, False, False, False]
are as likely ? This is why I ensure strictly positive scores for the True variables to get a unique MPE.
Sorry, it appears I missed an important line in my example, initializing the gating function which sets num_reps for the SPL.
For the num_reps
I added cmpe.beta.num_reps = 1
like mentioned before because I don't want to have additional parameters in a gating function, which is the case with the DenseGatingFunction (similar to a dense layer in a neural network) if I understood correctly ?
In your example with logprobs[:, 0:1, :] = 0 I would assume that both
[True, True, False, False, False]
and[False, False, False, False, False]
are as likely ?
Not really. We're working in log-space, so the above is equivalent to setting the first two variables to True.
For the
num_reps
I addedcmpe.beta.num_reps = 1
That also works. Although initializing the gating function doesn't really affect the output unless you invoke it explicitly.
Hi,
We're working in log-space, so the above is equivalent to setting the first two variables to True
Ok, I did not realize that litweights
where in the logspace by default. I believe this is not the case for get_torch_ac()
function of CircuitMPE
for instance, am I right ? (I found consistent results with litweights
as probabilities by default and not logprobs).
Although initializing the gating function doesn't really affect the output unless you invoke it explicitly.
That's good to know. In your example the gating function is only used to initialize the num_reps
attribute right ?