Substra/substra-backend

Objective unicity check

Closed this issue · 7 comments

The unicity of an objective is currently only based on the description content. This allows objective with the same metrics to be uploaded multiple time so that they are associated with different datasets.

This however prevents the creation of an objective with the same description but a different metrics archive.

We could have the unicity check include both description and metrics, which would allow the same flexibility when it comes to re-uploading a metrics with another description but would also allow to upload multiple metrics with the same description.

It should be reviewed and validated in the substra specification before implementing it.

@Kelvin-M I don't see anything in the spec preventing us from this change, I think we can do this update

@jmorel It may be easier to just compute the hash on the metrics file instead of description no ?

After giving this some thought, I think the unicity should be on metrics file + datamanager key + data samples keys. This way you can have multiple objective sharing the same metrics code but with different data sets. And you can have the same data set but with different metrics code. I think this covers every use case.

And it removes the description from the check, which in my opinion makes no sense.

@jmorel I think it will be difficult to add datamanager key and data samples keys as paramater for the hash.
Doing something like :

data_keys = ''.join([data_manager_pkhash] + sorted(data_sample_pkhashes))
pkhash = get_hash(metrics, key=data_keys)

will be difficult beccause backend db do not store or have access to the data_manager_pkhash and data_sample_pkhashes relation with the objective for the full save.
You can see in backend/backend/substrapp/models/objective.py for instance :)

In this case I summon the powers that be! @inalgnu @camillemarini what should we do here?

I do agree with @Kelvin-M, technically it won't be straightforward to use the keys of the backend.

In the meantime, using the metrics instead of the description would be a good short term solution.

Could you confirm @inalgnu or @camillemarini ?