locuslab/deq

Simple Model Training Issue

Diego-Bit-0 opened this issue · 4 comments

Hi, so I've been trying to create a simple adder model using some of the code in the deq.py file but my model doesn't seem to be learning at all. I've even tried to use a linear layer as the function for the deq forward pass with the inputs being the same as the outputs for the training and it still couldn't learn it. I've attached the script for my model ("add.py") with my versions of the broyden.py and deq.py scripts which have some minor modifications made for debugging purposes (found under the comments "#Diego: debugging purposes"). I was hoping you could help me understand why this is happening. Thank you for your time!

Scripts.zip

There seems to be multiple problems with your implementation. For instance,

  1. L179-194: Why did you put the forward function call (DEQFunc.apply), as well as the z, x, z0 in the __init__ function? This will only be called when the model is initialized. You will want to put them in forward(), like what you did in L166.

  2. L185-187: z, x and z0 are NOT parameters. They are actual inputs. Therefore, your DEQ.parameters() (in line 218) shouldn't include these.

  3. You didn't even call define and call the DEQ module with well-defined forward and backward passes. If you look at your deq.py (not mine, but your refactored one), the _solve_equi in DEQForward and _solve_back in DEQBackward are both undefined, as they are abstract methods that you need to inherit. One example is my DEQ-Transformer: https://github.com/locuslab/deq/blob/master/DEQModel/models/transformers/deq_transformer_forward_backward.py#L13
    where I actually subclassed the DEQForward object. Then, I actually called these two modules in training (self.deq for the forward pass, self.deqback for the backward pass): https://github.com/locuslab/deq/blob/master/DEQModel/models/transformers/deq_transformer.py#L372
    You need to do the same for your DEQAdder module. In your current code, you didn't even have a well-defined backward pass (DEQFunc.apply does not have a backward pass, which is in DummyDEQFunc).

Since your code only applies the forward root solving once and has no backward pass, I'm not surprised that the model doesn't learn anything at all.

Let me know if this helps, or if you need more concrete help on making it work (I may be able to help you with actual coding only a bit later this week only, though).

Thank you for the suggestions! They have been really helpful and have even answered some prior questions I had concerning some of the code. My team is interesting in applying your DEQ method to our deep feedforward network. Do you happen to have any code for a DEQ model in a simple feedforward case?

No, I don't have it for the simplest feedforward setting, but it shouldn't be hard to write. The only thing you may need to slightly adjust is the deq.py module, which currently assumes you are working on sequences (so 3-D). What kind of task are you trying to solve (e.g., image, etc.)?

For the rest, do the following:

I shall be able to create a cleaner tutorial version for you some time later (not sure when). But feel free to communicate with me further on this by emailing me at shaojieb@cs.cmu.edu.

No worries. The task we're working on is a simultaneous binary classification task mapping a matrix of features as soft values to a matrix of binary values of the same size. Each classification depends on the classifications of the other matrix entries. The data is unordered, so sequence models and graphical models do not apply, but luckily a transformer can. We would like to benchmark such results against a deep feed-forward network. The DEQ model captures both quite well.