wilhelm-lab/dlomix

[bug] parameter `columns_to_keep` of `PrecursorChargeStateDataset` class is not used

Closed this issue · 2 comments

branch feature/precursor-charge-prediction


parameter columns_to_keep of PrecursorChargeStateDataset class is not used. Later on the respective columns are hard coded.
https://github.com/wilhelm-lab/dlomix/blob/9fe499360c9bd3b229594d3765ad30536ffc9e26/dlomix/data/PrecursorChargeStateDataset.py#L184C11-L184C39

Actually being able to define the columns would be great!

Thanks @ayla-s !

True, we are currently tackling this in the following branch:
Migrating to HF datasets branch (I am working on this) -> https://github.com/wilhelm-lab/dlomix/tree/feature/migrate-to-huggingface-datasets

Once it is merged, datasets for all tasks will have this functioning as expected; where a subset of columns can be specified to keep from the input data for consumption afterward by the model.

The final version of the Charge State Dataset will be updated then by @Radox96.

I will keep this open and only close it once the functionality works as expected.

Thanks

omsh commented

@ayla-s This is not merged into main and released in version 0.1.0 with the complete hugging face datasets migrations.

There are two arguments to control this behaviour:

model_features: list of features that are available in the original data file and will be used as model inputs
dataset_columns_to_keep: list of columns in the original data to keep in the hugging face dataset, but not rerturn as tensors in the TensorFlow Dataset.


Closing the issue, please test the functionality and let me know.

cc: @Radox96