[bug] parameter `columns_to_keep` of `PrecursorChargeStateDataset` class is not used
Closed this issue · 2 comments
branch feature/precursor-charge-prediction
parameter
columns_to_keep
of PrecursorChargeStateDataset
class is not used. Later on the respective columns are hard coded.https://github.com/wilhelm-lab/dlomix/blob/9fe499360c9bd3b229594d3765ad30536ffc9e26/dlomix/data/PrecursorChargeStateDataset.py#L184C11-L184C39
Actually being able to define the columns would be great!
Thanks @ayla-s !
True, we are currently tackling this in the following branch:
Migrating to HF datasets branch (I am working on this) -> https://github.com/wilhelm-lab/dlomix/tree/feature/migrate-to-huggingface-datasets
Once it is merged, datasets for all tasks will have this functioning as expected; where a subset of columns can be specified to keep from the input data for consumption afterward by the model.
The final version of the Charge State Dataset will be updated then by @Radox96.
I will keep this open and only close it once the functionality works as expected.
Thanks
@ayla-s This is not merged into main and released in version 0.1.0 with the complete hugging face datasets migrations.
There are two arguments to control this behaviour:
model_features
: list of features that are available in the original data file and will be used as model inputs
dataset_columns_to_keep
: list of columns in the original data to keep in the hugging face dataset, but not rerturn as tensors in the TensorFlow Dataset.
Closing the issue, please test the functionality and let me know.
cc: @Radox96