luigibonati/mlcolvar

Walker column in utils.io.load_dataframe function and create_dataset_from_files

EnricoTrizio opened this issue · 2 comments

What is the use of the walker column in utils.io.load_dataframe? Is it needed?

         # check if file is in PLUMED format
        if is_plumed_file(filename):
            df_tmp = plumed_to_pandas(filename)
            df_tmp['walker'] = [i for _ in range(len(df_tmp))]
            df_tmp = df_tmp.iloc[start:stop:stride, :]
            df_list.append( df_tmp )
            
        # else use read_csv with optional kwargs
        else:
            df_tmp = pd.read_csv(filename, **kwargs)
            df_tmp['walker'] = [i for _ in range(len(df_tmp))]
            df_tmp = df_tmp.iloc[start:stop:stride, :]
            df_list.append( df_tmp )

I tried and it seems like it only returns a column of zeros

In case it should be kept, create_dataset_from_files should be modified to automatically exclude that column by default as it does with time and labels. Otherwise, when filter_args = None it loads the (useless) walker column

If a list of files is passed to load_dataframe then the walker column keeps track of this. i think it is good to exclude it from the create_dataset_from_files function as you suggest

Good, I'll take care of it in a future PR