startable/pdtable

Setting name when creating new table object from existing Table raises an exception

Closed this issue · 3 comments

table1 = Table(pd.DataFrame({
    'a': [1, 2, 3],
    'b': ["a", "b", "c"]
}), name='t1', units=["-", "text"])

table2 = Table(table.df, name="t2")
...
Exception: Got unexpected keyword arguments when creating Table object from existing pandas table: {'name': 't2'}

I do not consider this a bug and I am a little conflicted as to whether this is a good idea.

Table is a "stateless" facade (see https://pdtable.readthedocs.io/en/latest/usage/quickstart.html#idea). So I would expect table1 and table2 to use the same underlying TableDataFrame.

Making the facade contractor able to modify the TableDataFrame kind of obscures the facade approach to the point where we may risk consumers writing code that couples too tightly to pdtable (see advantage 1: https://pdtable.readthedocs.io/en/latest/usage/quickstart.html#idea).

I think it is more explicit to create the facade and then modify the name of the table:

t = Table(tdf)  # Construct throw-away facade for easy access
t.metadata.name = "t2"  # Change table name

I suppose what you are looking for is an easy way to create a new TableDataFrame based on an already existing TableDataFrame with a subset of the metadata updated?

Hi Kasper, please allow me to disagree. This is a very clear violation of the Liskov Substitution Principle. In fact, it violates a principle listed in the pdtable documentation itself:

Code can be written for (and tested with) pandas dataframes and still operate on TableDataFrame objects. This frees client code from necessarily being coupled to the startable project. A first layer of client code can read and manipulate StarTable data, and then pass it on as a (TableDataFrame-flavoured) pandas.DataFrame to a further layer having no knowledge of pdtable.

I don't see why this should not apply to the Table class. Think if table1 could be either a TableDataFrame or a regular pandas DataFrame. How would I go about creating a Table object? I would have to wrap it in a "try/except" clause. Not a nice API!

@guilhermebs I think it is important to discuss this to find the best solutions. I don"t think that a function that handles a subtype differently from the supertype is a violation of the Liskov substitution principle. However, I agree that the docs kind of indicate that you could expect this to work. Hence why it is not clear cut.

However, an alternative perspective could be that the facade contructor requires a metadata-augmented object - in which case it is not the pd.DataFrame mark that is important but the metadata attribute. (This way we could imagine developing towards a more table-backend agnostic code?)

In more practical concern: If you set table2 = Table(table1.df) table2 is a facade for the TableDataFrame table1.df and I would expect that any change to name (or other properites) are shared between the underlying TableDataFrame.

So are you proposing that table2 = Table(table1.df, name="t2") should result in the underlying TableDataFrame being updated (i.e. a short cut) or should table2 then not really result in a facade for table1.df but rather instantiate a new Table facade for a copy of table1.df with updated metadata. (The latter is quite different behaviour from what table2 = Table(table1.df) is doing currently <- which is my main concern with this solution)