error with transform_fold

Question

error with transform_fold

williehallock802 opened this issue 5 years ago · 7 comments

import pandas as pd
import numpy as np
import altair as alt

data = { 'ColA': {('A', 'A-1'): 'w',
                 ('A', 'A-2'): 'w',
                 ('A', 'A-3'): 'w',
                 ('B', 'B-1'): 'q',
                 ('B', 'B-2'): 'q',
                 ('B', 'B-3'): 'r',
                 ('C', 'C-1'): 'w',
                 ('C', 'C-2'): 'q',
                 ('C', 'C-3'): 'q',
                 ('C', 'C-4'): 'r'},
        'ColB': {('A', 'A-1'): 'r',
                 ('A', 'A-2'): 'w',
                 ('A', 'A-3'): 'w',
                 ('B', 'B-1'): 'q',
                 ('B', 'B-2'): 'q',
                 ('B', 'B-3'): 'e',
                 ('C', 'C-1'): 'e',
                 ('C', 'C-2'): 'q',
                 ('C', 'C-3'): 'r',
                 ('C', 'C-4'): 'w'} 
        }
                 
df = pd.DataFrame(data).reset_index( drop = True )

mychart = alt.Chart(df).transform_fold(
    [r'ColA', 'ColB'], as_=['column', 'value'] 
).mark_bar().encode(
    x=alt.X('value:N', sort=['r', 'q', 'e', 'w']),
    y=alt.Y('count():Q', scale=alt.Scale(domain=[0, len(df.index)])),
    column='column:N'
)

from altair_transform import extract_data
data = extract_data(mychart)
data.head()

generates the error:

altair-transform/altair_transform/core/fold.py in visit_fold(transform, df)
      9     transform = transform.to_dict()
     10     fold = transform["fold"]
---> 11     var_name, value_name = transform._get("as", ("key", "value"))
     12     value_vars = [c for c in df.columns if c in fold]
     13     id_vars = [c for c in df.columns if c not in fold]


AttributeError: 'dict' object has no attribute '_get'

Answer 1 · 2019-10-08T14:48:34.000Z

Thanks, I'll try to take a look.

Answer 2 · 2019-10-08T15:35:35.000Z

Same issue here.
Removing that underscore before the get solves the issue... and outputs

  column value
0   ColA     w
1   ColA     w
2   ColA     w
3   ColA     q
4   ColA     q

But maybe what's missing is an instance test like in data.py:35

        if isinstance(context, dict):
            datasets = context.get('datasets', {})
        else:
            datasets = context._get('datasets', {})

Answer 3 · 2019-10-08T16:27:28.000Z

That would work. This is an instance of general confusion throughout the codebase about whether inputs are dicts or schema objects. I went through a while ago and tried to address most of it, but this is one of the instances I missed (there may be others).

I think rather than an isinstance check each time we need to get an attribute, it would be better to normalize inputs so that we know what they are, and know what methods can be used on them.

Are you interested in working on this?

Answer 4 · 2019-10-08T16:35:44.000Z

So looking at this, we've already pre-converted the input to a dict, so just using transform.get() directly should be sufficient. The reason this was not caught is because there is no test of the fold transform (I'm certain I wrote the test when I wrote the code, but I'm not sure what happened to it).

Answer 5 · 2019-10-08T17:22:20.000Z

I don't think there's much else to do here. :o)

Answer 6 · 2019-10-08T17:41:35.000Z

TODO list:

fix the bug
add test coverage for fold transform

Answer 7 · 2019-10-10T03:33:37.000Z

Thank you for your offer. Unfortunately I didn't know enough about tests to be helpful, sorry.
Now, looking your solution, will help for next time.