Look into whether parameter creation from a tensor leads to incorrect dispose scope statistics.

Question

Look into whether parameter creation from a tensor leads to incorrect dispose scope statistics.

NiklasGustafsson opened this issue 3 months ago · 14 comments

I think there's a problem handling the statistics for this situation (in Parameter.cs):

        public class Parameter : Tensor
        {
            /// <summary>
            /// Constructor
            /// </summary>
            /// <param name="data">A tensor, which will become empty.</param>
            /// <param name="requires_grad"></param>
            public Parameter(Tensor data, bool requires_grad = true) :
                base(data.with_requires_grad(requires_grad).MoveHandle())
            {
                var scope = data.OwningDisposeScope;
                if (scope is not null) {
                    this.OwningDisposeScope = scope;
                    scope.Attach(this);
                    scope.Detach(data);
                }
            }

            /// <summary>
            /// Constructor
            /// </summary>
            /// <param name="handle">A tensor handle.</param>
            internal Parameter(System.IntPtr handle) : base(handle)
            {
            }
        };

I don't have a small repo, but I don't think the stats are collected properly when a parameter takes over the native handle from a non-Parameter tensor and registers itself with its dispose scope.

NiklasGustafsson commented 3 months ago

See:

#1398

NiklasGustafsson commented 3 months ago

Done.

Answer 1 · 2024-10-28T21:01:12.000Z

@mvphelps -- I can't assign this to you, but could you take a look at the issue?

Answer 2 · 2024-10-28T21:27:26.000Z

Maybe there should be a ReplaceWith(old,new) method on scopes for this purpose. It's a bit like move semantics in C++.

Answer 3 · 2024-10-28T21:28:13.000Z

@NiklasGustafsson Sure happy to look at it. To clarify, the code you show above is not present in the main branch. So is this a proposed implementation for Parameter?

I do think this implementation makes perfect sense - much like PackedSequence, it should start owning the data tensor, and should also dispose it properly, all while managing the statistics correctly. Additionally the second ctor that accepts just the IntPtr handle for a Tensor, I'm thinking we just have the create the Tensor from that handle and then delegate to the other ctor. I'd also add another test fixture to deal specifically with stats for parameters.

Edit: Also I do not mean adding a complete new set of statistics for Parameters, these are still Tensors.

Is that what you are thinking? Just want to make sure I understand fully, before I go making the PR. Thank you.

Also, could you explain the MoveHandle method on Tensor, since it is being used here? I see the code and it is simple but I don't understand why it's necessary and it has no comments. My guess on what it's doing is something unobvious with trying to avoid having multiple references to the same handle? Essentially taking the handle for the original tensor, and now making it the handle for the Parameter instead? And this method ensures only one reference on the IntPtr so GC will be able to do it's thing?

Answer 4 · 2024-10-28T21:33:54.000Z

I like the ReplaceWith idea - so it would just move the scope from the old object to the scope on the new object, and null out the scope on the old. The nice thing with that is it wouldn't actually increment anything. Seems like a pretty clean way to do it.

Answer 5 · 2024-10-28T21:35:51.000Z

This code is already in the main branch. Where it is failing is in a branch I'm working on, where many of the Modules are reimplemented in C# and using functional APIs to get to the native code. That means that the code needs to create Parameter instances, which are tensors used as module parameters.

Parameter is a sub-class of Tensor, but you can't just downcast, so the construction of a Parameter takes a Tensor and then takes over its native tensor handle. This means that there needs to be a special treatment of the handle for this case.

Anyway, it all works, including the Attach/Detach pair, except that it seems to mess with the new, more precise statistics gathering that you introduced.

Here is, for example, the new constructor for Linear:

internal Linear(long inputSize, long outputSize, bool hasBias = true, Device? device = null, ScalarType? dtype = null) : base(nameof(Linear))
{
    this.in_features = inputSize;
    this.out_features = outputSize;

    weight = torch.empty(outputSize, inputSize, device: device, dtype: dtype).AsParameter();
    init.kaiming_uniform_(weight, a: _sqrt5);

    if (hasBias) {
        bias = torch.empty(outputSize, device: device, dtype: dtype).AsParameter();
        var (fanIn, _) = init.CalculateFanInAndFanOut(weight);
        var bound = fanIn > 0 ? 1 / Math.Sqrt(fanIn) : 0;
        init.uniform_(_bias, -bound, bound);
    }
    //NOTE: it's important not to call 'RegisterComponents' here.
}

and the bias and weight properties:

public Parameter? bias {
    get => _bias;
    set {
        _bias?.Dispose();
        _bias = value?.DetachFromDisposeScope() as Parameter;
        ConditionallyRegisterParameter(BiasComponentName, _bias);
    }
}

public Parameter weight {
    get => _weight!;
    set {
        if (value is null) throw new ArgumentNullException(nameof(weight));
        if (value.Handle != _weight?.Handle) {
            _weight?.Dispose();
            _weight = (value.DetachFromDisposeScope() as Parameter)!;
            ConditionallyRegisterParameter(WeightComponentName, _weight);
        }
    }
}

Note that parameters inside modules must not be in a dispose scope.

Answer 6 · 2024-10-28T21:38:38.000Z

I must be missing something. I do not see this on main at all, and I'm looking directly here on GitHub:
https://github.com/dotnet/TorchSharp/blob/main/src/TorchSharp/NN/Parameter.cs

What am I missing?

Answer 7 · 2024-10-28T21:39:42.000Z

You're right (again :-)). It's on my soon-to-be created PR. Forgot that it wasn't on main. Come back in an hour...

Sorry!

Answer 8 · 2024-10-28T21:41:35.000Z

No worries, it happens. I'm on Central timezone and it is near EOD for me, I'll check it out in the morning. Should be able to have a PR for you tomorrow.

Answer 9 · 2024-10-28T21:42:14.000Z

I'll try the 'ReplaceWith' approach and see if that works. Maybe that's all it takes.

Answer 10 · 2024-10-28T21:44:44.000Z

Ok great. I think the only thing missing from what you have is that the OwningDisposeScope is not being cleared on the tensor. So even if the Parameter is detached, the Tensor still is and will get wiped when the scope disposes.

Once you do that I think it should work.

Answer 11 · 2024-10-28T22:12:24.000Z

This is what it looks like (and it seems to work:

        /// <summary>
        /// Replaces registration of one tensor with another.
        /// </summary>
        /// <param name="original">The original tensor, possibly registered under a dispose scope.</param>
        /// <param name="replacement">The replacement tensor.</param>
        internal static void ReplaceWith(torch.Tensor original, torch.Tensor replacement)
        {
            DisposeScope? scope = original.OwningDisposeScope;

            if (scope != null && scope.Disposables.Remove(original)) {
                original.OwningDisposeScope = null;
                AddToOther(scope, replacement);
            }
        }

Answer 12 · 2024-10-29T13:36:39.000Z

@NiklasGustafsson Looks good. I suggest adding the following test to TestDisposeScopesStatisticsTensor.cs. We don't need anything more comprehensive as this captures the unique behavior. Figured it would be easiest for you if I just pasted it here:

        [Fact]
        public void ParameterCreatedFromScopedTensorOnlyCountsDisposeForParameter()
        {
            var scope = torch.NewDisposeScope();
            var t = torch.tensor(3.0f);
            AssertTensorCounts(0, 0, 1, 0, 0, 0, 1);
            var p = new TorchSharp.Modules.Parameter(t);
            //Stats should not change when converting the tensor to a parameter
            AssertTensorCounts(0, 0, 1, 0, 0, 0, 1);
            t.Dispose();
            //The tensor doesn't own the lifetime now, the parameter does, so again no change.
            AssertTensorCounts(0, 0, 1, 0, 0, 0, 1);
            Assert.True(t.IsInvalid);
            Assert.False(p.IsInvalid);
            Assert.Equal(3.0f, p.ToSingle());
            scope.Dispose();
            //We can count the dispose when the parameter goes away.
            AssertTensorCounts(0, 0, 1, 1, 0, 0, 0);
            Assert.True(p.IsInvalid);
        }