Intercept in Contrast Coding Schemes

Question

Intercept in Contrast Coding Schemes

PaulWestenthanner opened this issue 2 years ago · 7 comments

Expected Behavior

The constant (all values 1) intercept column should not be added when applying contrast coding schemes (i.e. backward difference, sum, polynomial and helmert coding)

I don't think this intercept column is needed. If you fit a supervised learning model it is probably gonna help to remove the intercept column. I think it is there because when fitting linear models with statsmodels you have to add the intercept.
However I don't like that the output of an encoder would then depend on whether the intercept column is already there or not, e.g. if I first apply encoder A on column A and then encoder B on column B the intercept column of B overwrite A's intercept column hence not adding a new column. Also if I have (for some reason) a column called intercept that is not constant it would get overwritten.

Any opinion? Am I missing something? Is the intercept necessary?

Actual Behavior

A constant column with all values 1 is added

Steps to Reproduce the Problem

Run transform on any fitted contrast coding encoder, e.g.

        train = ['A', 'B', 'C']
        encoder = encoders.BackwardDifferenceEncoder(handle_unknown='value', handle_missing='value')
        encoder.fit_transform(train)

Answer 1 · 2022-11-06T15:07:14.000Z

Can intercept be added as a class parameter?
If so, then this is the way to go, imo. Then these classes could be tested with different values of intercept to catch errors and bugs.

Answer 2 · 2022-11-06T19:47:00.000Z

Yes we could I think, that should be rather straight forward as well. Would you set with_intercept=True as a default for backwards compatibility or not (which might be more correct)?

Answer 3 · 2022-11-06T20:11:51.000Z

Yes we could I think, that should be rather straight forward as well. Would you set with_intercept=True as a default for backwards compatibility or not (which might be more correct)?

Yep, I would set it to True to keep the default behavior intact