dotnet/TorchSharp

Unexpected parameters

lintao185 opened this issue · 11 comments

image
Here are two Tensor fields, which are just regular fields. However, after calling RegisterComponents(), TorchSharp will turn these two fields into parameters. Could you add an attribute to mark a field so that it won't be registered as a parameter?

Did you mean buffers? I have also noticed that when reading the relevant codes.

It seems to be a intentional design. Perhaps PyTorch used to behavior like this?

I think we should completely reconsider about RegisterComponents as we have also discussed here #1272 (comment).

A work around currently could be:

using TorchSharp;
using static TorchSharp.torch;

var m = new M();
Console.WriteLine(m.parameters().Count()); // 0
Console.WriteLine(m.buffers().Count()); // 0

class M : nn.Module<int, int>
{
    private readonly Tensor tensor = torch.zeros([]);

    public M() : base(nameof(M))
    {
        this.RegisterComponents();
        this._internal_buffers.Remove(nameof(tensor));
    }

    public override int forward(int input)
    {
        return input;
    }
}

Great issue. What are the some of the situations where you need a module to have tensor members, but they're not buffers that become part of the state_dict?

I think the point is that we should always allow users to decide it, instead of implicitly forcing to do (or not to do) something.

And that's also why I suggested to remove the call of RegisterComponents in register_module. RegisterComponents could do any thing, but it should depends on the users to decide whether it should be called.

Of course, users could always override RegisterComponents in their own modules. In that case, the default implemention is not that important actually, and the only reason to do the change is to make it consistent with PyTorch.

A practical situation might be to save some constants or "cached" tensors to speed it up?

A practical situation might be to save some constants or "cached" tensors to speed it up?

Yes

Okay. Here's what I think should be done then:

  1. Any field that is of type 'Parameter' will be a parameter.
  2. Any field of type 'Module' will be a child module.
  3. A field of type 'Tensor' will be a buffer iff it is decorated with an attribute 'Buffer'.

#3 will be a breaking chance, but it makes more sense to me to require tensor fields to be decorated if they are buffers than if they are not (which would also be breaking, but break fewer modules, I believe).

As far as I know, buffers are always manually registered in PyTorch. So perhaps we don't have to provide the automatic registration for it?

Of course adding this feature would be better. We don't have to be consistent with PyTorch in every detail. Right?

What I'm hesitating about is that if we do so, there seems to be no reason not to do the same for parameters and modules as well (especially modules, who knows whether a custom type would be a module or not, haha). (But maybe we could conversely use [IgnoreComponent] or something else because PyTorch do register them by default.)

there seems to be no reason not to do the same for parameters and modules as well (especially modules, who knows whether a custom type would be a module or not, haha)

I fail to see a scenario where something that is a Parameter field is not meant to be treated as a parameter. Same with modules -- it's not a module unless it's derived from Module, and if it is, what scenario would not have it as a submodule?

As far as I know, buffers are always manually registered in PyTorch. So perhaps we don't have to provide the automatic registration for it?

It's going to be a breaking change, no matter what. So, my question is -- how bad is it that buffers are automatically registered?

I believe it's just a kind of taste, and the only reason to do the change is to make it consistent with PyTorch.

But that could be a sufficient reason, since it may confuse those who is more familiar with PyTorch or who is converting a PyTorch module into TorchSharp.

Great issue. What are the some of the situations where you need a module to have tensor members, but they're not buffers that become part of the state_dict?

RNN where I'd like to cache state tensors for the next forward pass. Module itself is logical place to have the cache.