Allow selection of columns returned when slicing with the dictionary method
DarylWM opened this issue · 2 comments
Is your feature request related to a problem? Please describe.
I like the semantics of using a dictionary to slice the data:
wide_ms1_points_df = raw_data[
{
"rt_values": slice(float(precursor_cuboid_d['wide_ms1_rt_lower']), float(precursor_cuboid_d['wide_ms1_rt_upper'])),
"mz_values": slice(float(precursor_cuboid_d['wide_mz_lower']), float(precursor_cuboid_d['wide_mz_upper'])),
"scan_indices": slice(int(precursor_cuboid_d['wide_scan_lower']), int(precursor_cuboid_d['wide_scan_upper'])),
"precursor_indices": 0,
}
]
I might be missing it but I haven't seen a way to also choose the columns returned in the dataframe with this method, so the dataframe is:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4564742 entries, 0 to 4564741
Data columns (total 13 columns):
# Column Dtype
--- ------ -----
0 raw_indices int64
1 frame_indices int64
2 scan_indices int64
3 precursor_indices int64
4 push_indices int64
5 tof_indices uint32
6 rt_values float64
7 rt_values_min float64
8 mobility_values float64
9 quad_low_mz_values float64
10 quad_high_mz_values float64
11 mz_values float64
12 intensity_values uint16
dtypes: float64(6), int64(5), uint16(1), uint32(1)
memory usage: 409.2 MB
Describe the solution you would like
Something like this could be considered:
wide_ms1_points_df = raw_data[
{
"rt_values": slice(float(precursor_cuboid_d['wide_ms1_rt_lower']), float(precursor_cuboid_d['wide_ms1_rt_upper'])),
"mz_values": slice(float(precursor_cuboid_d['wide_mz_lower']), float(precursor_cuboid_d['wide_mz_upper'])),
"scan_indices": slice(int(precursor_cuboid_d['wide_scan_lower']), int(precursor_cuboid_d['wide_scan_upper'])),
"precursor_indices": 0,
"columns": ['frame_indices','scan_indices','rt_values','mz_values','intensity_values']
}
]
Allowing the choice of column type would be useful as well:
"dtypes": [np.uint16, np.uint16, np.float32, np.float64, np.uint16]
Describe alternatives you've considered
Dropping unwanted columns and downcasting the column types works fine. I think this idea would reduce the compute effort though.
Additional context
Add any other context or screenshots about the feature request here.
Interesting suggestion that I indeed hadn't considered before for direct slicing.
That said, there is an easy work around that is partially documented in cell 13-15 of the notebook tutorial. To get some more feeling about the inner workings, check out the actual slicing code. In brief, any slice will always first obtain the raw indices. By default, it then converts these raw indices to a dataframe with all coordinates. You can either set the last element of a single slice to "raw" to skip this dataframe conversion, or you can even set the default by creating a TimsTOF object with a slice_as_dataframe=False
. For compatability with dict slicing like you use it, probably only the latter option actually works. Once you have the raw indices, you can manually convert them to the indices you want by selecting the appropriate values with the data.as_dataframe(indices)
function or even more low level with the data.convert_from_indices(indices)
function.
I think the casting option is probably a fringe case, which is easier to just do after obtaining the dataframe instead of upfront...
Dear Daryl, since this issue has not been active in a long time and there is a doable workaround, I will close it. Let me know if you still have further questions