has2k1/plotnine

`facet_wrap` + `geom_abline` fails if column name contains space

david-cortes opened this issue · 1 comments

The following throws an error:

import numpy as np, pandas as pd, plotnine as p9
rng = np.random.default_rng(seed=123)
df = pd.DataFrame({
    "x1": rng.standard_normal(size=100),
    "x2": rng.standard_normal(size=100),
    "with space": rng.choice(['a', 'b', 'c'], size=100, replace=True)
})
(
    p9.ggplot(
        df,
        p9.aes(x="x1", y="x2")
    )
    + p9.geom_point()
    + p9.geom_abline(slope=1, intercept=0)
    + p9.facet_wrap("with space")
)
  File <string-expression>:1
    with space
    ^
SyntaxError: invalid syntax

It will succeed if change any of the following:

  • I rename the column "with space" to something without spaces in the name.
  • I remove the call to geom_abline.
  • I remove the call to facet_wrap.

This is an case that we can't really fix and it should not be a bug. facet_wrap does not accept a column name, it accepts a valid python expression.

For example

import numpy as np, pandas as pd, plotnine as p9
rng = np.random.default_rng(seed=123)
df = pd.DataFrame({
    "x1": rng.standard_normal(size=100),
    "x2": rng.standard_normal(size=100),
    "with space": rng.choice(['a', 'b', 'c'], size=100, replace=True),
    "nospace": rng.choice(['a', 'b', 'c'], size=100, replace=True)
})

def double(series):
    return series * 2

(
    p9.ggplot(
        df,
        p9.aes(x="x1", y="x2")
    )
    + p9.geom_point()
    + p9.geom_abline(slope=1, intercept=0)
    + p9.facet_wrap("double(nospace)")

    # These are also valid 
    # + p9.facet_wrap("double( nospace )")  
    # + p9.facet_wrap("nospace.str.upper()")

    # This is invalid for all cases
    #+ p9.facet_wrap("double(with space)") 
)

facet_wrap_expression

It will succeed if change any of the following:

I remove the call to geom_abline.
I remove the call to facet_wrap.

These work just by chance because the column is not evaluated as code for any of those cases. But as you build more complicated plots it becomes inevitable. Ideally all column names should be valid python symbol names.