Evovest/EvoTrees.jl

MLJModelInterface.fit does not accept tables?

olivierlabayle opened this issue · 6 comments

Hello,

Thank you for the work here!

Apologies if this is not the right place for the following question. As I understand it, it seems the MLJModelInterface.fit method for EvoTypes does not allow for general tables (The machine interface works well because it calls the reformat function beforehand):

using EvoTrees
using MLJBase

n = 100
X = MLJBase.table(rand(n, 3))
y = rand(n)

evo = EvoTreeRegressor()
MLJBase.fit(evo, 1, X, y)

From the MLJ doc I thought that should be the case or am I understanding it wrong?

Effectively, fit expects that data provided has went through the reformat step. However, MLJBase.fit! works fine on tabular data, and you can actually start training using that function as well, so it may be all that is needed.
I'd like to have a dedicated tabular handling within EvoTrees, notably to manage categorical data, but I'm missing time for that!

If reformat is implemented, then fit is not required to accept tables. Rather it accepts the form of data output by reformat. I thought the docs were clear on this point but am happy for a PR to clarify.

The "data front end" apparatus allows machines to avoid reconverting data from user-form (eg, table) into model-specific form (eg, matrix) in certain cases: in particular, when retraining using the same view of the data (rows) but new hyper-parameters, such as an iteration parameter. In this way, for example, external control of iterative models (using IteratedModel wrapper) for example) is possible, without data conversions happening every iteration.

Also, when choosing a different view of the same data (new rows) but same hyper-parameters, conversions are avoided. So, for example, in cross-validation. The model overloads selectrows for his model-specific format.

I understand, thank you both for the clarification!

Maybe the sentence that would benefit from clarification is the following: "If the core algorithm being wrapped requires data in a different or more specific form, then fit will need to coerce the table into the form desired (and the same coercions applied to X will have to be repeated for Xnew in predict)."

It is indeed later said that the data front-end is an alternative option but it wasn't obvious that the MLJModelInterface.fit would then not be required to respect the "table input contract".

How about, following the cited sentence, we add the new sentence:

"An exception to this requirement occurs when a data front-end is implemented; see Implementing a data front-end below."

That would be great thank you!

I'm assuming this can be closed. Feel fre to reopen otherwise.