Boolean indexing doesn't work with subclass and TypeVar
Closed this issue · 2 comments
Describe the bug
A suggested on #908 (comment) I'm trying to use DataFrameT = TypeVar("DataFrameT", bound=DataFrame),
but with boolean indexing instead of a pipe.
To Reproduce
from typing import TypeVar, reveal_type
from pandas import DataFrame, Series
class SubDF(DataFrame):
# https://pandas.pydata.org/pandas-docs/stable/development/extending.html#override-constructor-properties
@property
def _constructor(self):
return SubDF
@property
def _constructor_sliced(self):
return Series
sub = SubDF({'a': [1, 2, 3]})
DataFrameT = TypeVar("DataFrameT", bound=DataFrame)
def func(df: DataFrameT) -> DataFrameT:
index = Series([True, False, True])
df_ = df.loc[index]
reveal_type(df_)
return df_ # Type "DataFrame" is not assignable to return type "DataFrameT@func"
reveal_type(func(sub))pyright:
/workspaces/ng/repro.py:27:17 - information: Type of "df_" is "DataFrame"
/workspaces/ng/repro.py:29:12 - error: Type "DataFrame" is not assignable to return type "DataFrameT@func"
Type "DataFrame" is not assignable to type "DataFrameT@func" (reportReturnType)
/workspaces/ng/repro.py:32:13 - information: Type of "func(sub)" is "SubDF"
1 error, 0 warnings, 2 informations
Please complete the following information:
- OS: Linux
- OS Version 20.04.6
- python version 3.12.2
- version of type checker 1.1.390
- version of installed
pandas-stubs2.2.3.241126
Additional context
Repro inspired by:
Might be same root cause as:
It's not a .loc issue. You get a similar result with any DataFrame method that returns a DataFrame, e.g.:
def func2(df: DataFrameT) -> DataFrameT:
df_ = df.query("x <= 10")
return df_ # Type "DataFrame" is not assignable to return type "DataFrameT@func"
reveal_type(func2(sub))The type revealed is still correct (SubDF) in this case.
That particular example can be fixed by changing query() to return Self instead of DataFrame. I imagine we'd have to do that with any of the methods in class DataFrame that currently return DataFrame - i.e., change them to return Self
But loc is different, because it is returning the class _LocIndexerFrame, so I think that latter class would have to become generic with Self passed in as a parameter, so it is a subclass of Generic[_T]
I tried this idea and it worked on your example.
PR with tests welcome.