Issues with current ML validation score

Question

Issues with current ML validation score

GinoWoz1 opened this issue 6 years ago · 3 comments

GinoWoz1 commented 6 years ago

Hello,

Thanks for the help so far. I was able to get the tool up and running in windows.

However, 2 weird things I am observing.

When I use Gradient Boost Regressor - my score gets worse by the generation even when I switched the scoring function sign. The first score is nearly my best score I have gotten by myself (no feature engineering done on data set).

https://github.com/GinoWoz1/AdvancedHousePrices/blob/master/FEW_GB.ipynb

When I use Random Forest - same scorer - current ML validation score returns as 0 and runs really fast

https://github.com/GinoWoz1/AdvancedHousePrices/blob/master/FEW_RF.ipynb

I think I am missing something on how to use this tool but no idea what. I am trying to use this in tandem with TPOT as I am exploring feature creation GA/GP based tools. Sincerely appreciate any advice/guidance you can provide.

Sincerely,
G

Answer 1 · 2018-12-05T16:38:50.000Z

Hello @lacava

Sorry for the bother , but have you had a chance to look at this ? I have been messing around with TPOT for the last 4 months and have talked to Randy Olson a few times ; he had referred me to Few and I am hoping to do a few tests with Few and Tpot over the winter . My name is Justin Joyce and currently I am exploring multiple genetic algorithm and programming methods as a masters student .

Sincerely,
Justin

Answer 2 · 2018-12-05T17:29:53.000Z

Hi Justin, I did look at it and ran it a couple times. It looks like there is a small bug with Few, which is that it prints out that the current ml validation score is 0 when it is not, as shown by the internal CV score that is printed.

Otherwise, this just seems to be a dataset that is not amenable to feature learning. I have found that, when paired with Gradient boosting or other high-capacity methods, it is quite difficult to find a transformation of the data that will improve the underlying ML using Few. Using Lasso, I was able to occasionally find a reduced feature space, but not one that dramatically improved the score.

You also may be interested in trying Feat, which is a more powerful version of Few that I have been working on for the last year. It has a similar sklearn interface, uses a GA to drive search, and includes neural network activation functions and backprop for learning weights. Here's the result of running that:

from feat import Feat
from sklearn.metrics import r2_score


learner = Feat(gens=1000,max_stall=100,pop_size=100,backprop=True,
               verbosity=2,
               max_dim=50,
               feature_names = ','.join(X_train.columns))
X = X_train.values
y = y_train
learner.fit(X,y)


print('final score: {}'.format(r2_score(y_train, learner.predict(X)))) 
  
print('model:\n',learner.get_model())

final score: 0.9004700679745707
model:
Feature Weight
relu(2ndFlrSF) 4575965.203086
(2ndFlrSF^2) -2993884.266219
(2ndFlrSF^3) 2704420.880597
2ndFlrSF -2653911.531777
relu(2ndFlrSF) -2328605.023701
(GrLivArea^3) -972888.992427
LotArea 829523.137047
YearBuilt 741179.594699
relu(GrLivArea) 689061.927277
relu(OverallQual) 630256.458467
relu(LotArea) -593285.483499
TotalBsmtSF 536721.385827
float(OverallCond) 408904.496879
sqrt(|YearBuilt|) 341173.116787
1stFlrSF 340342.422143
float(GarageCars) 286630.673027
OverallQual 230831.115685
(OverallQual*GarageArea) 214190.124423
BsmtFinSF1 192567.902371
(TotalBsmtSF+BsmtUnfSF) -192281.015785
relu(OverallQual) 189096.541817
float(Fireplaces) 183822.299462
GrLivArea 155897.078241
float(Condition2_Norm) 116659.048371
ScreenPorch 102781.956623
float(Neighborhood_OldTown) -100922.111480
float(HalfBath) 97458.382917

The downside is that you can't specify your own scoring_function at the moment.

Answer 3 · 2018-12-05T17:32:25.000Z

When I use Gradient Boost Regressor - my score gets worse by the generation even when I switched the scoring function sign. The first score is nearly my best score I have gotten by myself (no feature engineering done on data set).

This i did not observe. I did observe that Few did not find better features, but the Internal CV stayed constant, as it should.