tensorflow/decision-forests

Models trained with fit_on_dataset_path behave unexpectedly

rstz opened this issue · 2 comments

rstz commented
          Hey @rstz ,

Thank you for your swift response and your interest in the problem. I think we are talking about the serialization/deserialization problem (and not necessarily about data format itself).
I've put together a simple Collab: https://colab.research.google.com/drive/19sepbkGXwM8lI6fZuovvYRAVSiYCNiKl?usp=sharing
so that you may have more details and play around. I've tested all this with all versions (as mentioned in my previous comment, with various systems as well) -> can't figure out what is wrong. I hope this will give much more feedback and we will be able to quickly solve the problem (if there is any).

Thank you !!

Originally posted by @piotrlaczkowski in #136 (comment)

achoum commented

Unfortunately in TensorFlow, a loaded model is not equivalent to an original model. An original model contains several utilities to automatically convert and adapt dataset format, while a loaded model is very restrictive and specific.

The error:

ValueError: Could not find matching concrete function to call loaded from the SavedModel. Got:
      Positional arguments (2 total):
        * OrderedDict([('MPG', <tf.Tensor 'inputs:0' shape=(None, 1) dtype=float32>),
                 ('Cylinders',
                  <tf.Tensor 'inputs_1:0' shape=(None, 1) dtype=int32>),
                 ('Displacement',
                  <tf.Tensor 'inputs_2:0' shape=(None, 1) dtype=float32>),
                 ('Horsepower',
                  <tf.Tensor 'inputs_3:0' shape=(None, 1) dtype=float32>),
                 ('Weight', <tf.Tensor 'inputs_4:0' shape=(None, 1) dtype=float32>),
                 ('Acceleration',
                  <tf.Tensor 'inputs_5:0' shape=(None, 1) dtype=float32>),
                 ('year', <tf.Tensor 'inputs_6:0' shape=(None, 1) dtype=int32>),
                 ('Origin', <tf.Tensor 'inputs_7:0' shape=(None, 1) dtype=int32>)])
        * False
      Keyword arguments: {}
    
     Expected these arguments to match one of the following 2 option(s):
    
    Option 1:
      Positional arguments (2 total):
        * {'Acceleration': TensorSpec(shape=(None,), dtype=tf.float32, name='Acceleration'),
     'Cylinders': TensorSpec(shape=(None,), dtype=tf.float32, name='Cylinders'),
     'Horsepower': TensorSpec(shape=(None,), dtype=tf.float32, name='Horsepower'),
     'MPG': TensorSpec(shape=(None,), dtype=tf.float32, name='MPG'),
     'Origin': TensorSpec(shape=(None,), dtype=tf.float32, name='Origin'),
     'Weight': TensorSpec(shape=(None,), dtype=tf.float32, name='Weight'),
     'year': TensorSpec(shape=(None,), dtype=tf.float32, name='year')}
        * True
      Keyword arguments: {}

Indicates that the provided dataset does not match what the model expects:

  • The dataset should contain a dictionary of Tensorflow (instead of OrderedDict of Tensorflow).
  • The label "'Displacement'" column should not be provided (it does not make sense to feed the label to the model for inference).
  • The dtype are wrong. By default TF-DF encodes all the numerical features as float32. Instead, it seems that make_csv_dataset interprets some of those values as integers.

Here is a proposed fix of those 3 issues:

for batch in ds.batch(3).take(1):
	
    clean_batch = dict(batch)

    for k in clean_batch:
      clean_batch[k] = tf.cast(tf.squeeze(clean_batch[k]), tf.float32)

    del clean_batch["Displacement"]

    print(clean_batch)
    print(model_loaded.predict(clean_batch))

Result:

{'MPG': <tf.Tensor: shape=(3,), dtype=float32, numpy=array([18. , 30.9, 26. ], dtype=float32)>, 'Cylinders': <tf.Tensor: shape=(3,), dtype=float32, numpy=array([6., 4., 4.], dtype=float32)>, 'Horsepower': <tf.Tensor: shape=(3,), dtype=float32, numpy=array([88., 75., 93.], dtype=float32)>, 'Weight': <tf.Tensor: shape=(3,), dtype=float32, numpy=array([3021., 2230., 2391.], dtype=float32)>, 'Acceleration': <tf.Tensor: shape=(3,), dtype=float32, numpy=array([16.5, 14.5, 15.5], dtype=float32)>, 'year': <tf.Tensor: shape=(3,), dtype=float32, numpy=array([73., 78., 74.], dtype=float32)>, 'Origin': <tf.Tensor: shape=(3,), dtype=float32, numpy=array([1., 1., 3.], dtype=float32)>}
1/1 [==============================] - 0s 51ms/step
[[221.37837 ]
 [110.83273 ]
 [107.609535]]
rstz commented

Closing this, feel free to reopen if there is additional information/ assistance needed