Inconsistent Feature Set Size Between Training and Test Sets

Question

Inconsistent Feature Set Size Between Training and Test Sets

Opened this issue 2 years ago · 0 comments

Description:

I am encountering an issue while working on my house price prediction project using advanced learning algorithms. The problem lies in the discrepancy between the number of features in my training set and the test set. After performing feature engineering on the training set, I ended up with 270 features. However, when I attempt to use the same feature engineering techniques on the test set, I only obtain 254 features.

This inconsistency poses a challenge as it prevents me from making accurate predictions on the test set using the trained model. I have verified that the training and test sets are aligned correctly, and the problem seems to originate from the feature engineering process.

Steps to Reproduce:

Load the training set with 270 features.
Perform feature engineering techniques (e.g., normalization, scaling, encoding) on the training set.
Verify that the feature engineering process has successfully transformed the training set.
Load the test set and apply the same feature engineering techniques used on the training set.
Observe that the resulting test set only contains 254 features, instead of the expected 270.

Expected Behavior:

The feature engineering process should yield the same number of features in both the training and test sets, ensuring consistency between the two.

Actual Behavior:

The feature engineering process results in a discrepancy between the number of features in the training and test sets. The test set has 16 fewer features compared to the training set.

Any guidance or suggestions on resolving this inconsistency would be greatly appreciated. Thank you for your assistance!