DIDONEproject/music_symbolic_features

ConcatTask have the `feature_set` name wrong

Closed this issue · 2 comments

00sapo commented

def __init__(self, tasks: List[Task]):
self.tasks = tasks
self.__loaded = False
assert len(self.tasks) >= 2, "ConcatTask must have at least 2 tasks"
extensions = [task.extension for task in self.tasks]
assert all(ext == extensions[0] for ext in extensions), "Extensions must match"
friendly_names = [task.dataset.friendly_name for task in self.tasks]
assert all(
name == friendly_names[0] for name in friendly_names
), "Datasets must match"
super().__init__(
self.tasks[0].dataset, self.tasks[0].feature_set, extensions[0]
)
# self.dataset = self.tasks[0].dataset
# self.extension = extensions[0]
feature_set_names = [task.feature_set.name for task in self.tasks]
self.name = (
friendly_names[0]
+ "-"
+ "-".join(feature_set_names)
+ "-"
+ extensions[0][1:]
)

The feature_set attribute of ConcatTask is initialized to the first feature set. It should be initialized to ''.join(feature_set_names), as in the task.name attribute.

Right now, the task using feature-set A as first feature-set will also include all the concatenations such as A-B, A-C, A-B-C (e.g. task including musif_native is actually including musif_native, musif_native-music21_native, musif_native-jsymbolic, musif_native-music21_native-jsymbolic, for a total of 5 tasks)

00sapo commented

Solving this issue would also allow for longer times in the AutoML when jobs are run on shared clusters where computing powers is scheduled according to the duration of the job

00sapo commented

Fixed in 9150c5f