Typing `names` argument in `select_columns_by_name`
mroeschke opened this issue · 3 comments
Currently DataFrame.select_columns_by_name
has names
typed as names: Sequence[str]
.
I think the intention here is Sequence
is a non-scalar container of string labels, but Sequence[str]
also matches a pure str
.
hey - this looks correct to me
a str
is an sequence of str
. If you have single-letter column names, and pass a string with the column names, I'd expect it to work
and it does:
In [20]: pd.api.interchange.from_dataframe(pd.DataFrame({'a': [1,2,3], 'b': [4,5,6], 'c': [7,8,9]}).__d
...: ataframe__().select_columns_by_name('ab'))
Out[20]:
a b
0 1 4
1 2 5
2 3 6
Ah okay I forgot about that possibility. Closing then
.select_columns_by_name('ab')
seems pretty ambiguous and bug-prone to me. I'd expect that to give me a single column named 'ab'
, not two columns named 'a'
and 'b'
. I think the intention was for this to be spelled ['a', 'b']
.
Unfortunately I'm not sure if there's a way to fix this to make it unambiguous while allowing all non-string sequences. And list[str, ...]
may be too restrictive. So I guess we have to leave it as is either way.