skrub-data/skrub

`fetch_ken_embeddings` does not use `suffix` with default parameter

Closed this issue · 1 comments

The following works as expected:

from skrub.datasets import fetch_ken_embeddings

X = fetch_ken_embeddings(
    search_types="game_development_companies|game_companies|game_publish",
    embedding_table_id="games",
    suffix="_aux",
    pca_components=100,
)
X.columns
Index(['Entity', 'Type', 'X0_aux', 'X1_aux', 'X2_aux', 'X3_aux', 'X4_aux',
       'X5_aux', 'X6_aux', 'X7_aux',
       ...
       'X90_aux', 'X91_aux', 'X92_aux', 'X93_aux', 'X94_aux', 'X95_aux',
       'X96_aux', 'X97_aux', 'X98_aux', 'X99_aux'],
      dtype='object', length=102)

However, if pca_components is not specified and thus the default at 200:

X = fetch_ken_embeddings(
    search_types="game_development_companies|game_companies|game_publish",
    embedding_table_id="games",
    suffix="_aux",
)
X.columns
Index(['Entity', 'Type', 'X0', 'X1', 'X2', 'X3', 'X4', 'X5', 'X6', 'X7',
       ...
       'X190', 'X191', 'X192', 'X193', 'X194', 'X195', 'X196', 'X197', 'X198',
       'X199'],
      dtype='object', length=202)

So there is something fishy there.

fixed by #956