gee-community/geemap

Issues related to the ml.rf_to_strings() function

Closed this issue · 4 comments

Environment Information

  • geemap version: 0.32.0
  • Python version: 3.9.18
  • Operating System: Windows 11

Description

I try to train a random forest regression model locally and upload the trees to GEE using the ml module in geemap.

Everything goes well, but there is an error when converting the uploaded FeatureCollection into a ee.Classifier.

It seems something wrong when parsing the trees to strings with the ml.rf_to_strings() function, which causes GEE unable to correctly parse trees.

What I Did

import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from geemap import ml

data = pd.read_csv('test_data.csv')
feature_names = ['M_factor']
label = 'Y'
X = data[feature_names]
Y = data[label]
RF_model=RandomForestRegressor(n_estimators = 10, max_depth = 100, max_leaf_nodes = 10)

trees = ml.rf_to_strings(rf, 'M_factor', output_mode='REGRESSION')
ml.export_trees_to_fc(trees, 'users/xxx/test_RF')

rf_fc = ee.FeatureCollection('users/xxx/test_RF')
classifier = ml.fc_to_classifier(rf_fc)
print(classifier.getInfo())

I got this error:

EEException: Classifier.decisionTreeEnsemble: Error parsing line 4: expected 8, got 3.

Then, I checked the output Strings of ml.rf_to_strings() after replace("#", "\n")

Below left is the Strings of first tree. It is not the right text format for ee.Classifier.decisionTreeEnsemble()

I think the correct format should be the right one:

# output of rf_to_strings()
1) root 318 9999 9999 (1920.9306693208575)
  2) M <= 2.395950 318 702.9509 52.756976
    4) M <= 1.210300 135 146.4352 27.100205
  3) M > 2.395950 318 702.9509 52.756976
    6) M <= 3.917950 183 258.2605 71.954699
      12) M <= 3.202850 48 48.4855 53.513342 *
    7) M > 3.917950 183 258.2605 71.954699
      14) M <= 5.669350 59 109.2434 78.161322 *
      15) M <= 0.756870 26 9.0926 10.210856 *
    7) M > 1.210300 135 146.4352 27.100205
      14) M <= 1.709550 36 17.9174 30.780418 *
      15) M > 5.669350 104 111.9215 83.128264
        30) M <= 7.591500 48 48.4855 53.513342 *
      15) M > 3.202850 79 89.4751 57.965748
        30) M <= 3.625100 36 17.9174 30.780418 *
      15) M > 1.709550 43 33.3576 40.295379 *
      16) M > 0.756870 30 16.9621 20.15918 *
        32) M > 3.625100 15 49.2581 69.193269 *
        33) M > 7.591500 8 16.5197 94.848 *
# Maybe the correct format?
1) root 318 9999 9999 (1920.9306693208575) 
  2) M <= 2.395950 135 146.4352 27.100204 
    4) M <= 1.210300 56 37.9256 15.559417 
      8) M <= 0.756870 26 9.0926 10.210855 *
      9) M > 0.756870 30 16.9621 20.159180 *
    5) M > 1.210300 79 48.7859 35.970396 
      10) M <= 1.709550 36 17.9174 30.780418 *
      11) M > 1.709550 43 33.3576 40.295378 *
  3) M > 2.395950 183 258.2605 71.954699 
    6) M <= 3.917950 79 89.4751 57.965748 
      12) M <= 3.202850 48 48.4855 53.513341 *
      13) M > 3.202850 31 70.6118 65.293666 
        26) M <= 3.625100 16 56.6369 60.685045 *
        27) M > 3.625100 15 49.2581 69.193269 *
    7) M > 3.917950 104 111.9215 83.128264 
      14) M <= 5.669350 59 109.2434 78.161321 *
      15) M > 5.669350 45 49.3268 89.129986 
        30) M <= 7.591500 37 47.7640 87.870084 *
        31) M > 7.591500 8 16.5197 94.847999 *

Did you try out this example? https://geemap.org/notebooks/46_local_rf_training
Does it work?

I just realized that you were using RandomForestRegressor. Only RandomForestClassifier is supported. I would suggest you import the train data into GEE and train the model directly using GEE rather than scikit-learn.

Did you try out this example? https://geemap.org/notebooks/46_local_rf_training Does it work?

Sorry for late reply. I have try and it works fine.

I know we can upload train data and get the model on GEE. But it will not work when the amount of train data is large due to the user memory limit.

So can geemap support uploading Random Forest Regression model later? Thanks~

I just realized that you were using RandomForestRegressor. Only RandomForestClassifier is supported. I would suggest you import the train data into GEE and train the model directly using GEE rather than scikit-learn.

I am not sure whether this problem is exactly caused by the use of RandomForestRegressor.

Because I was successful when I train the model without the parameter of max_leaf_nodes = 10

RF_model=RandomForestRegressor(n_estimators = 10, max_depth = 100)

But it failed when I add this parameter

RF_model=RandomForestRegressor(n_estimators = 10, max_depth = 100, max_leaf_nodes = 10)