Perl-Apollo/Corinna

Private Method Semantics

Ovid opened this issue · 15 comments

Ovid commented

This proposal is for nailing down the semantics of private methods in Corinna.

  • Private methods can only be called from methods defined in the namespace and file at compile time
  • Private methods are not inherited.
  • If a class or role has a private method with the name matching the name of the method being called, the dispatch is to that method.
  • Even if a subclass has a public or private method with the same signature, the methods in a class class will call its private method, not the inherited one
  • Roles and classes cannot call each other's private methods
  • Possibly for V2: ... unless the role has a requirement of a private method (see the section on Roles, below)

Private Methods

In Corinna, we would like to have the idea of truly private methods. Traditionally in Perl, this has been done by prefixing a method with a leading underscore:

sub _i_am_sekret {
    ...
}

Of course, if you've worked on large-scale systems long enough, you know how fragile this is. You over-rely on inheritance and somewhere deep in a subclass, someone writes a "private" method with the same name and boom.

Trust me, this is real fun (*cough*) to debug.

So Corinna wants to do this:

class Parent {
    private method foo () { say 'parent' }
    method bar () { $self->foo }
}
class Child is Parent ()
    # this won't work
    method bar () { $self->foo }
}
Child->new->bar;

The above would be fatal because we don't have a foo method in the child and we'll get a method not found error.

But what about this?

class Parent {
    private method foo () { say 'parent' }
    method bar () { $self->foo }
}
class Child is Parent ()
    private method foo () { say 'child' }
}
Child->new->bar;

In the above, do we get 'child' or 'parent'? We should get 'parent', not 'child'. Otherwise, children could override private methods and this would be a gross violation of encapsulation.

Instead, private methods are not inherited. If a class or role has a private method with the name matching the name of the method being called, the dispatch is to that method.

Roles

There is nothing special about private methods in roles, but they are not flattened into the consuming class and cannot conflict with class methods. Private methods are bound to the namespace in which they are declared. This gives us great encapsulation, but does it require the method be bound at compile-time rather than runtime? If so, does Perl even support that? Or do we need to do a runtime check every time?

Thus:

role SomeRole {
    method role_method () { $self->do_it }
    private method do_it () { say "SomeRole" }
}
class SomeClass does SomeRole {
    method class_method () { $self->do_it }
    private method do_it () { say "SomeClass" }
}
my $object = SomeClass->new;
say $object->class_method;
say $object->role_method;

The above do_it role does not conflict because it's not provided by the role. It's strictly internal. Further, it cannot be aliased, excluded, or renamed by the consumer. This gives role authors the ability to write a role and have fine-grained control over its behavior. This is in sharp contrast to Moo/se (Less::Boilerplate is a personal class I use that provides strict/warnings/signatures, etc.).

#!/usr/bin/env perl
use Less::Boilerplate;

package Some::Role {
    use Moose::Role;
    use Less::Boilerplate;
    sub role_method ($self) { $self->_foo }
    sub _foo ($self)        { say __PACKAGE__ }
}

package Some::Class {
    use Moose;
    use Less::Boilerplate;
    with 'Some::Role' => { -exclude => '_foo' };
    sub _foo ($self) { say __PACKAGE__ }
}

Some::Class->role_method;

The above prints Some::Class even though the role author may have been expecting Some::Role. So private methods are a huge win here.

However, roles can declare a reliance on private methods:

role ToJson {
    use Some::JSON::Module;
    private method to_hashref ();

    method to_json () {
        return jsonify($self->to_hashref);
    }
}

That private method may be be provided by the class or another role. If it's declared by another role, the class won't have direct access to that method, but the role methods will.

Java Comparison

Let's compare this to behavior in Java, a language that has dealt with this for a long time.

Example via this StackOverflow link.

public class Test {
  public static void main(String args[]){
      Student s = new Student();
      s.printPerson();
  }
}

class Person{
  private String getInfo(){
      return "Person";
  }
  public void printPerson(){
      System.out.println(getInfo());
  }
}

class Student extends Person{
  private String getInfo(){
      return "Student";
  }
}

In the above, when we call s.printPerson(), it prints "Person" and not "Student" because in Java, private methods are not inherited.

It gets even more confusing to Perl developers when they encounter something like Parent object = new Child() in Java. That's perfectly legal because in Java, the variable is typed whereas in Perl, it's the data which stores type information. As this distinction doesn't exist in Perl, it's important to tread carefully here.

(Note: there's also been some discussion about trusts in Corinna, but I'd rather push that off to v2)

This comment is writing down a first version of my thoughts, but before I've even submitted it, I've changed my mind and realize this probably won't work. I'm submitting this comment anyway for posterity, and am following it with another.

I expect that proper private methods are visible only within the specific class that declares them. That means that if a parent and a child both have private foo methods then calls to foo within the parent or child unambiguously go to the one in the same class as the caller. Another way to look at it in a Perl context is that a private method is essentially a lexically scoped sub in the declaring class, as if someone said "my $foo = sub {...}" and indeed that is how they could be implemented in today's Perl.

Now as for what to do with other scope combinations... If a parent declares a public foo() then a child should be forbidden from declaring a private foo() to avoid confusion. But if a parent declares a private foo() then a child should be allowed to declare a public foo().

Where it gets more complicated is if both want to declare a public foo() then this should be forbidden unless the child's copy is explicitly marked as overriding the parent, and perhaps even then it should only be allowed if the parent explicitly declares that the method is allowed to be overridden. I'm not too stuck on what happens here and this deserves more discussion, but I think with the private cases its fairly clear cut.

@Ovid Your comment about Java is false, I think. While Java variables are typed, their values ALSO store type information. Just as would be the case with Perl if Perl supported types better. Modern Java supports generics and knows heaps of things at runtime about the actual objects it is dealing with, and users can introspect this at runtime. So all variable types really are is constraints that can optionally be checked at compile time. But you can still declare a variable as anything more generic than what you're putting in it. You could declare everything as Object (the parent-most class) if you want to, and do runtime checks on the class of a runtime object, and you can runtime introspect what members or methods it has as well as their parameters and such. But also significant, you can introspect at runtime what private members or methods an object has AND you can call a special method to disable the private restrictions and assign to private members from anywhere. So private is more an advisory thing that you can still get around with an extra step. This is why you can annotate private members in Java for some other class to set their values at runtime.

My second version of a prior comment:

What Corinna should be doing in order to be consistent and future-proofed and elegant is consider that scope of methods is a spectrum and not just black and white private/public. How this should be conceived is that the definition of every class or role is available at runtime to be introspected by any code anywhere.

So when something is declared private, that doesn't mean others aren't allowed to know about it, and in fact they ARE allowed to know it exists. Rather, private means they know the method exists and they know they aren't allowed to invoke it.

In Java, trying to invoke a private method, if not blocked at compile time, is blocked in the form of a thrown exception at runtime that specifically says you don't have permission to call it, and this is a very distinct exception from the one that says the method doesn't exist. Corrina should do likewise.

Now when I talk about spectrum and future-proofing, we need to account for scenarios like trusts where some, but not all, classes other than the owner are allowed to invoke non-public methods.

So in the general case, say a method can have 3 statuses, which we might call private/privileged/public. A private method may only be called in the class it is declared, public may be called by every class, and privileged is similar to private except that any classes named by a trusts declaration in the owning class are also allowed to call it. I called it privileged but it is analogous to .NET/C# internal and Java package-private (though Java's design is broken).

Trusts/privileged is explicitly orthogonal to any inheritance relationships and the vast majority of the time a trusted class is not a parent or child but for example a factory class and the class whose objects it makes. In the DBI context, which uses the factory class model, a connection handle and statement handle would likely trust each other.

Now this is all important when you consider the interaction of parent/child classes declaring the same method name.

I would say when a parent has a private foo() its child can declare a foo() with any visibility scope, and calls in either class to foo() will call their own.

When a parent declares a public foo() then its child is forbidden from declaring any foo() except in specific cases where it is an override and that is allowed.

When a parent declares a privileged foo() then it is effectively private() except in the specific case where the parent declares it trusts a child, and then the child is forbidden from declaring any foo() itself.

Something like that.

Private methods are bound to the namespace in which they are declared.

How is "private" different from the lexical "my" subroutines we already have in Perl? Does the difference justify a new keyword (even if it is one borrowed from Java)?

@HaraldJoerg raises a good point. We could just use the keyword "my" to mean private, and "our" to mean public, kind of like how it works in Raku maybe.

Actually here's another thought. How about using trait syntax for this, say :private and :public etc?

Also, we should make it so that any method not explicitly declaring a visibility should default to private.

Ovid commented

Currently, calling a lexical sub declared with my requires it to be called as private_sub($self). That's a bit of a mess. Also, because it's called as a sub instead of a method, the ability of Corinna to distinguish between method calls and sub calls would likely be broken.

On the #cor IRC channel, one proposal was using $foo->&blah syntax (something currently illegal) and have a slightly different can behavior for determining dispatch candidates.

Also, if visibility isn't explicitly declared and everything defaults to private, I suspect there would be many grumpy developers. "Good ideas" aren't always "practical ideas" :) At least one dev has already told me they won't use Corinna if they have to slap a :reader on every slot that needs to be able to be read publicly. Other devs have simply complained loudly.

Basically I'm proposing default to safety. I'm assuming that in a well designed program the public interface is minimized and as much as possible is private. So defaulting to private is optimizing for this.

I also believe these concerns can be resolved with certain shorthands, which I think Raku may have.

Lets say if a slot has just :ro then a public reader will be declared with the default name, and if they instead have just :rw then public readers and writers will be declared with default names. One must still use :reader or :writer if they want to declare alternate generated names, but people who care about typing fewer characters probably wouldn't do that.

On a separate matter, something I will want is to have a way independently of the above to declare that a slot is final and that once assigned it can't be assigned to again even by the class in which it lives, kind of like immutability, for example we would require that it can only be assigned to in the custructor and not any other method. Basically this is support for immutable objects to another level.

I should also mention that .NET and Java somewhat agree with me. If you don't specify a visibility on a member or method it defaults to internal or package-private (and in the case of Java, the complete lack of a visibility indicator is unfortunately the only way to say package-private). This is a middle ground between fully private and fully public. The analogy for Perl would be visible to the other classes that the current one trusts. However I would sooner just default to private.

Ovid: I am aware that my sub private_sub has to be called as private_sub($object). I was rather suggesting my method private_sub instead of private method private_sub to avoid introducing yet another keyword. You wrote in IRC "private methods probably shouldn't even show up in the symbol table" which is another thing they share with "my" stuff.

Also, I don't think that bdfoy's rant against lexical subroutines is a concern here. He shows many cases where ourand mysubs are mixed in different packages within the same (file) scope. Such uses should provide predictable results, but they are not to be recommended at all. I also don't really care if lexical subs are insanely broken before v5.22 (as written in that rant), as we're talking beyond 5.32 here.

In the above, do we get 'child' or 'parent'? We should get 'parent', not 'child'. Otherwise, children could override private methods and this would be a gross violation of encapsulation.

Why is that? Method dispatch means you can override private methods in children so long as they have the same visibility. I would expect that to work in other language too. So long as you don't declare the method static or the like.

So I had a new realization that the visibility feature I considered so important, trusts/internal/etc, can likely for what I want be faked over top of a fundamental OO system that only natively has private+public.

I am speaking of the feature where internals of class X can be effectively public to classes Y and Z while effectively private to every other class.

As such I am no longer that concerned with the plan for Corinna version 1 to exclude more than the relatively simple and straight forward private+public dual.

The main scenario I was concerned with that traditionally benefitted from a trusts/etc feature involved connected objects that were mutually created exclusively by factory methods.

In this context, there would be at least a one-way internal reference between the objects, where the factory method of the first object provides a reference to itself via a constructor argument of the object of the other class it creates, and then the newer object stores the reference to its creator in one of its own instance members, and that is optionally exposed via an accessor so a user can ask the newer object who its parent is.

In this context, code in each of the 2 classes would often want to access each others' internals for reasons but no other classes should, and typically these internals would not be exposed publicly.

A hypothetical example is that a DBI statement handle is created by a DBI connection handle factory method, and so those 2 handle objects would have have access to each others' internals but non-DBI classes would not.

So an alteration to class design that can work around a lack of language-native trusts/etc is that each class which conceptually wants to have shared internals is implemented as a corresponding inner+outer pair of classes, where an object of outer has a private member which is an object of inner, and inner's members are all public and are the members that outer conceptually has.

In this context, the factory method of class X that makes an object of class Y would pass both the outer X and the inner X objects as arguments to the constructor of Y, and Y then stores both as private members, and Y never exposes the inner X object to the public, though it might expose the outer X object to the public. And then code in Y has access to the internals of X via the inner X object and no other object has access to those internals if it wasn't made by a factory of X, except transitively.

I will still have to test this but I think it would probably work.

How does one test private methods or lexical subs?

Ovid commented

@clscott Generally speaking, you don't test them directly. You use standard black box approaches of testing those other functions which call them.

Agreed. One should be structuring their classes such that anything which reasonably should be directly tested should be public, which may mean pulling that out into separate classes consumed by what they were otherwise part of, and anything private is internal details that not even tests should be calling directly.

Ovid commented

This is resolved for the MVP. Further issues should be new tickets.