TupleTransformer
wdevazelhes opened this issue · 2 comments
As discussed with @bellet, it would be useful to have a sort of TupleTransformer
object, that would take as __init__
a regular scikit-learn Transformer
(so it would be a MetaEstimator
), and that would fit
/transform
on tuples using the given Transformer
(instead of the dataset of points)
i.e. it would deduplicate the points inside, fit the transformer on the dataset, and be able to transform it. This would allow to use it in a pipeline like:
from sklearn.decomposition import PCA
from sklearn.pipeline import make_pipeline
from metric_learn import TupleTransformer, ITML
from sklearn.model_selection import cross_val_score
model = make_pipeline(TupleTransformer(PCA()), ITML())
cross_val_score(model, pairs, y_pairs)
It could also be useful in some cases to have an way to use metric learning algorithms to transform tuples, like a transform_tuples
method for instance
There may be other options too, this issue is to discuss about this
Can you explain a little more on the inputs and outputs of the TupleTransformer
? It seems like it would need access to the label information at some point, but I'm not familiar enough with the MetaEstimator
API to see how that would work.
I think the TupleTransformer
would simply take tuples as input, internally turn them into a plain unlabeled dataset X
(by collecting all points involved in tuples) and feed this as input to whatever regular unsupervised transformer given at init?
We won't be able to use any label information (e.g., similar/dissimilar labels for pairs) in the since they are not at the individual point level. So only unsupervised transformers should be allowed (e.g., PCA, but not LDA).