Simple Model Training Issue
Diego-Bit-0 opened this issue · 4 comments
Hi, so I've been trying to create a simple adder model using some of the code in the deq.py file but my model doesn't seem to be learning at all. I've even tried to use a linear layer as the function for the deq forward pass with the inputs being the same as the outputs for the training and it still couldn't learn it. I've attached the script for my model ("add.py") with my versions of the broyden.py and deq.py scripts which have some minor modifications made for debugging purposes (found under the comments "#Diego: debugging purposes"). I was hoping you could help me understand why this is happening. Thank you for your time!
There seems to be multiple problems with your implementation. For instance,
-
L179-194: Why did you put the forward function call (DEQFunc.apply), as well as the z, x, z0 in the
__init__
function? This will only be called when the model is initialized. You will want to put them in forward(), like what you did in L166. -
L185-187: z, x and z0 are NOT parameters. They are actual inputs. Therefore, your DEQ.parameters() (in line 218) shouldn't include these.
-
You didn't even call define and call the DEQ module with well-defined forward and backward passes. If you look at your deq.py (not mine, but your refactored one), the
_solve_equi
inDEQForward
and_solve_back
inDEQBackward
are both undefined, as they are abstract methods that you need to inherit. One example is my DEQ-Transformer: https://github.com/locuslab/deq/blob/master/DEQModel/models/transformers/deq_transformer_forward_backward.py#L13
where I actually subclassed theDEQForward
object. Then, I actually called these two modules in training (self.deq
for the forward pass,self.deqback
for the backward pass): https://github.com/locuslab/deq/blob/master/DEQModel/models/transformers/deq_transformer.py#L372
You need to do the same for your DEQAdder module. In your current code, you didn't even have a well-defined backward pass (DEQFunc.apply
does not have a backward pass, which is inDummyDEQFunc
).
Since your code only applies the forward root solving once and has no backward pass, I'm not surprised that the model doesn't learn anything at all.
Let me know if this helps, or if you need more concrete help on making it work (I may be able to help you with actual coding only a bit later this week only, though).
Thank you for the suggestions! They have been really helpful and have even answered some prior questions I had concerning some of the code. My team is interesting in applying your DEQ method to our deep feedforward network. Do you happen to have any code for a DEQ model in a simple feedforward case?
No, I don't have it for the simplest feedforward setting, but it shouldn't be hard to write. The only thing you may need to slightly adjust is the deq.py
module, which currently assumes you are working on sequences (so 3-D). What kind of task are you trying to solve (e.g., image, etc.)?
For the rest, do the following:
- Define your "layer" as needed. It has to be input-injected (so the
forward
function should look atforward(x, z)
). I suggestLinear
+sigmoid
for the layer, if you just want to get your hands dirty and try. - Create two classes, inheriting the
DEQForward
andDEQBackward
classes, like what I did https://github.com/locuslab/deq/blob/master/DEQModel/models/transformers/deq_transformer_forward_backward.py. You don't need to do something that complicated (e.g., sequence breaking, etc.). Just callDEQFunc.apply
to solve for the root, andDummyDEQFunc
in the backward. - Create another class under
nn.Module
, instantiate the layer you defined. Instantiate bothDEQForward
andDEQBackward
with the layer. Look at what I did https://github.com/locuslab/deq/blob/master/DEQModel/models/transformers/deq_transformer.py#L263 for reference. In theforward()
function of this class, perform the forward pass, like https://github.com/locuslab/deq/blob/master/DEQModel/models/transformers/deq_transformer.py#L372.
I shall be able to create a cleaner tutorial version for you some time later (not sure when). But feel free to communicate with me further on this by emailing me at shaojieb@cs.cmu.edu.
No worries. The task we're working on is a simultaneous binary classification task mapping a matrix of features as soft values to a matrix of binary values of the same size. Each classification depends on the classifications of the other matrix entries. The data is unordered, so sequence models and graphical models do not apply, but luckily a transformer can. We would like to benchmark such results against a deep feed-forward network. The DEQ model captures both quite well.