Tribler/py-ipv8

Parameter info is no longer showing in PyCharm for dataclass payloads

qstokkink opened this issue · 11 comments

In the past, the following construction would convince PyCharm that it should pull parameter info from the given class as if it were a vanilla dataclass (which actually it is, internally).

dataclass = overwrite_dataclass(dataclass)

This is expected:

expected

This is reality with the overwrite_dataclass:

reality

The lack of variable is not reported and there is no suggestion for the missing parameter.

It seems like we would ideally use a dataclass_transform. However, this was introduced in Python 3.11 but does exist in typing_extensions.

The PyCharm source code hints at the existence of an "extensions module" where types can be registered.

This StackOverflow answer shows how to annotate custom decorators.

Using dataclass_transform to annotate our custom dataclass-like decorator does not seem to work 😢 Disclaimer: I may be doing something wrong though.

I think I'll give up on typing this for now. Using arcane typing constructs, my best efforts have only produced "dataclass AND Type[VariablePayload]"-types so far. This is not good enough as it does not expose the fields of whatever object is being "dataclassed" (and mypy will complain about missing *args and **kwargs).


For anyone who wants to try and tackle this, the issue is as follows:

Suppose we have a dataclass definition (taken from overlay_tutorial_5.py.

@dataclass(msg_id=1)  # Custom dataclass-like transform
class MyMessage:
    clock: int

Any given MyMessage should have a type that shows it is both a dataclass of Type[MyMessage] AND Type[VariablePayload] (the latter is injected at runtime by the custom dataclass-like transform).

It seems that vp_compile is actually destroying the type info. These are my tests:

def acts_as_payload(inp: Payload) -> None: pass

# ---

@dataclass()
class UserClass:
    a: bool

with suppress(TypeError):  # O.K. Does error at runtime
    UserClass()  # O.K. Type does expose missing arg
uc = UserClass(a=False)  # O.K. Does allow instantiation
acts_as_payload(uc)  # O.K. Type does expose UserClass is not a Payload
print(uc)  # O.K. Does print

# ---

class PayloadUserClass(UserClass, Payload):
    def to_pack_list(self): return []
    def from_unpack_list(cls, a): return PayloadUserClass(a)

with suppress(TypeError):  # O.K. Does error at runtime
    PayloadUserClass()  # O.K. Type does expose missing arg
puc = PayloadUserClass(a=False)  # O.K. Does allow instantiation
acts_as_payload(puc)  # O.K. Type does act as Payload
print(puc)  # O.K. Does print

# ---

class VPPayloadUserClass(UserClass, VariablePayload):
    names = list(get_type_hints(UserClass).keys())
    format_list = list(map(type_map, get_type_hints(UserClass).values()))

with suppress(TypeError):  # O.K. Does error at runtime
    VPPayloadUserClass()  # O.K. Type does expose missing arg
vpuc = VPPayloadUserClass(a=False)  # O.K. Does allow instantiation
acts_as_payload(vpuc)  # O.K. Type does act as Payload
print(vpuc)  # O.K. Does print

# ---

@vp_compile
class CVPPayloadUserClass(UserClass, VariablePayload):
    names = list(get_type_hints(UserClass).keys())
    format_list =  list(map(type_map, get_type_hints(UserClass).values()))

with suppress(TypeError):  # O.K. Does error at runtime
    CVPPayloadUserClass()  # ERROR Type does NOT expose missing arg
cvpuc = CVPPayloadUserClass(a=False)  # O.K. Does allow instantiation
acts_as_payload(cvpuc)  # O.K. Type does act as Payload
print(cvpuc)  # O.K. Does print

As it turns out, PyCharm essentially voids class types. Perhaps even worse, the typing in docstrings works better than typing using typing. An example:

from __future__ import annotations

from dataclasses import dataclass


@dataclass
class UserClass:
    a: bool

    def __call__(self, a: bool): ...  # Needed for get2()


def uc_get1() -> type[UserClass]:
    return UserClass


def uc_get2():
    """
    Docstring needed for get2().

    :rtype: UserClass
    """
    return UserClass


UserClass()  # WORKS: Shows "Parameter 'a' unfilled"
uc_get1()()  # DOES NOT WORK!: Shows nothing
uc_get2()()  # WORKS: Shows "Parameter 'a' unfilled"

The above is based on PyCharm 2023.2.2 (Community Edition).

I'll try to see if there is a mypy plugin for PyCharm that does better at interpreting types. We can also simply wait for some future version of PyCharm that detects class types correctly.

I'll keep this issue open for the long term but - short term - we should remove the overwrite_dataclass (also from the docs) because it doesn't do anything anymore.

EDIT: Removed in #1214.

After further investigation, given that this issue is actually not a typing issue from our side but an internal PyCharm issue instead, I'll close this issue after all. Secondarily, the root cause for the typing mismatch has been reported multiple times on the PyCharm repository already (2 years ago).

For people using VS Code/Neovim/any code editor with a good LSP, the current dataclass payload version leads to many type errors, because the type checker does not know that a dataclass'd payload is an actual Payload, nor does it understand that after the override through the class decorator it has the attributes you provide. Hence, you don't actually really get any of the benefits from using a dataclass, type-wise at least (of course it still prevents having to write __init__ methods, etc.).

To fix this, you would indeed want to use dataclass_transform as mentioned. The best way to go about this would probably be with metaclasses or inheriting from a base class (a la BaseModel from Pydantic), but that interfers with the inheritance from Payload/VariablePayload as these are abstract classes. Switching them to protocols would probably break too much.

So, for a quick and dirty fix, you could implement something like this:

class VariablePayloadBase(VariablePayload):
    msg_id: int

T = TypeVar("T", bound=VariablePayloadBase)

@dataclass_transform()
def with_id(msg_id: int) -> Callable[[type[T]], type[T]]:
    def wrap(cls: type[T]):
        # define a new class so it doesn't already have any methods inherited from VariablePayload
        class NewPayload:
            pass

        NewPayload.__annotations__ = cls.__annotations__
        NewPayload.__name__ = cls.__name__
        NewPayload.__qualname__ = cls.__qualname__

        # this is the dataclass function from payload_dataclass.py
        dcls = dataclass(NewPayload, msg_id=msg_id)

        return dcls

    return wrap  # type: ignore

This can then be used as follows:

@with_id(msg_id=3)
class DolevMessage(VariablePayloadBase):
    source: int
    content: str
    path: str


d = DolevMessage(3, 'some message', '')

If this is something you're actually interested in @qstokkink, I can open a PR for it. I'm using it now for an assignment for the Distributed Algorithms class at TU Delft and it's working well.

It might need some tweaking, as default values and field specifiers probably don't work with the way this is constructed. Probably it would be best to just implement this directly inside the payload_dataclass.py, but that might break code as it might not work exactly the same.

Only adding dataclass_transform() should also help a bit at least, but then you still don't have the VariablePayload as a base class so the LSP still won't accept these in any method taking those as arguments.

@tiptenbrink thanks for the suggestion! I had something similar in mind with a base class but without trying to save the custom dataclass() decorator overwrite, which was made exclusively to fix typing for an older version of PyCharm anyway.

If we let go of the decorator idea, we can use a simple function call:

@dataclass  # <-- actual dataclasses.dataclass now
class MyMessage(VariablePayloadWID):  # ipv8.messaging.lazy_payload.VariablePayloadWID
    clock: int


convert_to_payload(MyMessage, msg_id=1)

Where convert_to_payload is implemented as follows (loosely following the current custom dataclass decorator):

def convert_to_payload(dataclass_type: Type, msg_id: int | None = None) -> None:
    if msg_id is not None:
        dataclass_type.msg_id = msg_id
    dt_fields = fields(dataclass_type)  # dataclasses.fields()
    type_hints = get_type_hints(dataclass_type)  # typing.get_type_hints()
    dataclass_type.names = [field.name for field in dt_fields]
    dataclass_type.format_list = [type_map(type_hints[field.name]) for field in dt_fields]  # ipv8.messaging.payload_dataclass.type_map
    setattr(sys.modules[dataclass_type.__module__], dataclass_type.__name__, vp_compile(dataclass_type))

I like this function-call approach better as it avoids the decorators that most IDE's crap out on (even though in the previous approach the ACTUAL decorator typing data is correct and they therefore should support it).

Short term, mypy is the only typing tool we use/support in the latest release.

Alright, I've been convinced to give this another shot. The target is the following:

@dataclass  # <- actual dataclass wrapper, no magic
class MyMessage(DataclassPayload):  # <- bucketloads of magic inside of the base class
    a: int
    b: int
    ...  # etc.

Potentially even the following:

@dataclass
class MyMessage(DataclassPayload[12]):  # Bind to message id 12
    a: int
    b: int
    ...  # etc.

I tried to implement my idea from the previous post and - long story short - it can work (using a metaclass inside of the new base class) but we have to wait until we move to Python 3.9. For now, I'll stick to my previous answer of mypy being our only supported typing tool.

Some more info: for Python 3.7 and Python 3.8, down the line in get_type_hints() the mysterious TypeError: 'type' object is not subscriptable will occur if - anywhere in the currently loaded imports - PEP 585 standard collection types are used. We would have to either backport our entire codebase to the old typing syntax (unacceptable), do something really hacky like manually resolving strings to types (unacceptable), or wait for Python 3.9.

I'll keep this issue open without priority to save it for the long-term roadmap.