scikit-learn-contrib/metric-learn

TupleTransformer

wdevazelhes opened this issue · 2 comments

As discussed with @bellet, it would be useful to have a sort of TupleTransformer object, that would take as __init__ a regular scikit-learn Transformer (so it would be a MetaEstimator), and that would fit/transform on tuples using the given Transformer (instead of the dataset of points)
i.e. it would deduplicate the points inside, fit the transformer on the dataset, and be able to transform it. This would allow to use it in a pipeline like:

from sklearn.decomposition import PCA
from sklearn.pipeline import make_pipeline
from metric_learn import TupleTransformer, ITML
from sklearn.model_selection import cross_val_score

model = make_pipeline(TupleTransformer(PCA()), ITML())
cross_val_score(model, pairs, y_pairs)

It could also be useful in some cases to have an way to use metric learning algorithms to transform tuples, like a transform_tuples method for instance

There may be other options too, this issue is to discuss about this

Can you explain a little more on the inputs and outputs of the TupleTransformer? It seems like it would need access to the label information at some point, but I'm not familiar enough with the MetaEstimator API to see how that would work.

I think the TupleTransformer would simply take tuples as input, internally turn them into a plain unlabeled dataset X (by collecting all points involved in tuples) and feed this as input to whatever regular unsupervised transformer given at init?

We won't be able to use any label information (e.g., similar/dissimilar labels for pairs) in the since they are not at the individual point level. So only unsupervised transformers should be allowed (e.g., PCA, but not LDA).