ageron/handson-ml3

[BUG] Chapter 2 OneHotEncoder Shape Mismatch Issue + Solution

MadinaKamolova opened this issue · 2 comments

Thanks for helping us improve this project!

Before you create this issue
Please make sure you are using the latest updated code and libraries: see https://github.com/ageron/handson-ml3/blob/main/INSTALL.md#update-this-project-and-its-libraries

Also please make sure to read the FAQ (https://github.com/ageron/handson-ml3#faq) and search for existing issues (both open and closed), as your question may already have been answered: https://github.com/ageron/handson-ml3/issues

Describe the bug
Edition 3, page 133/1457 (Kindle e-book), the date fit-transformed by OneHotEncoder is not sent into .toarray() and results in error -- onehotencoder ValueError: Shape of passed values is (2, 1), indices imply (2, 5). With current code in the book, Python sees df_test_unknown.shape as (2,1).

To Reproduce
Please copy the code that fails here, using code blocks like this:

cat_encoder.handle_unknown = "ignore"
cat_encoder.transform(df_test_unknown)

df_output = pd.DataFrame(cat_encoder.transform(df_test_unknown),
                         columns=cat_encoder.get_feature_names_out(),
                         index=df_test_unknown.index)

Solution

cat_encoder.handle_unknown = "ignore"
test = cat_encoder.transform(df_test_unknown)
df_output = pd.DataFrame(test.toarray(),
                         columns=cat_encoder.get_feature_names_out(),
                         index=df_test_unknown.index)

Versions (please complete the following information):

  • OS: Windows 11
  • Python: [e.g. 3.11]

Additional context
Maybe add to FaQ or elsewhere where you think readers will notice (buying a book again for just one fix is impractical)