bihealth/NeatMS

Index Error in annotation tool

Closed this issue · 2 comments

When trying to generate training data with the annotation tool, the tool crashes after around 20-30 labels with an Index Error (see text below). Any advice would be appreciated. I am happy to share my notebook and data with you if necessary.

IndexError Traceback (most recent call last)
c:\users\ethan\appdata\local\programs\python\python37\lib\site-packages\NeatMS\peak.py in get_chromatogram(
self=<NeatMS.peak.Peak object>,
margin=1
)
106 rt_start, rt_end = self.get_window_margin(margin)
107 # We extract the values inside the peak window (mz and RT window)
--> 108 chromatogram = self.sample.extract_chromatogram(rt_start, rt_end, self.mz_min, self.mz_max)
chromatogram = undefined
self.sample.extract_chromatogram = <bound method Sample.extract_chromatogram of <NeatMS.sample.Sample object at 0x000002D0DA4AB188>>
rt_start = 16.961999999999996
rt_end = 17.562000000000005
self.mz_min = 1053.8347677336883
self.mz_max = 1053.8979997167116
109 # Containt [rt, intensity, mz]
110 return chromatogram

c:\users\ethan\appdata\local\programs\python\python37\lib\site-packages\NeatMS\sample.py in extract_chromatogram(
self=<NeatMS.sample.Sample object>,
rt_start=16.961999999999996,
rt_end=17.562000000000005,
mz_min=1053.8347677336883,
mz_max=1053.8979997167116
)
36
37 def extract_chromatogram(self, rt_start, rt_end, mz_min, mz_max):
---> 38 return self.raw_data.extract_chromatogram(rt_start, rt_end, mz_min, mz_max)
self.raw_data.extract_chromatogram = <bound method RawData.extract_chromatogram of <NeatMS.data.RawData object at 0x000002D0DA9ED1C8>>
rt_start = 16.961999999999996
rt_end = 17.562000000000005
mz_min = 1053.8347677336883
mz_max = 1053.8979997167116
39
40

c:\users\ethan\appdata\local\programs\python\python37\lib\site-packages\NeatMS\data.py in extract_chromatogram(
self=<NeatMS.data.RawData object>,
rt_start=16.961999999999996,
rt_end=17.562000000000005,
mz_min=1053.8347677336883,
mz_max=1053.8979997167116
)
26 def extract_chromatogram(self, rt_start, rt_end, mz_min, mz_max):
27 if self.MS1 is None:
---> 28 chromatogram = self.reader.extract_chromatogram(self.file, rt_start, rt_end, mz_min, mz_max)
chromatogram = undefined
self.reader.extract_chromatogram = <bound method PymzmlDataReader.extract_chromatogram of <NeatMS.data.PymzmlDataReader object at 0x000002D0DA6B9808>>
self.file = WindowsPath('../data/mzMLs/covid_plasma/tmp/B1_NIST1950_3_6540.mzML')
rt_start = 16.961999999999996
rt_end = 17.562000000000005
mz_min = 1053.8347677336883
mz_max = 1053.8979997167116
29 else:
30 # If The MS1 data has been extracted, then extract the chromatogram directly from the array (splited into two lines for comprehension purposes)

c:\users\ethan\appdata\local\programs\python\python37\lib\site-packages\NeatMS\data.py in extract_chromatogram(
self=<NeatMS.data.PymzmlDataReader object>,
file_path=WindowsPath('../data/mzMLs/covid_plasma/tmp/B1_NIST1950_3_6540.mzML'),
rt_start=16.961999999999996,
rt_end=17.562000000000005,
mz_min=1053.8347677336883,
mz_max=1053.8979997167116
)
111 # One more step is required as a single rt can contain several mz values, we need to find and sum those values
112 # Extract all unique rt values
--> 113 all_rt = np.unique(chromatogram[2])
all_rt = undefined
global np.unique = <function unique at 0x000002D1400B24C8>
global chromatogram = undefined
114 # Sum Intensities for rt values that are not unique (Equivalent of pandas groupby but using numpy arrays)
115 # Average mz for rt values that are not unique (mz values are kept for consistensy but not used)

IndexError: index 2 is out of bounds for axis 0 with size 0

yoglo commented

Hello,

The error seems to indicate that no signal is found in the mzML file within the given rt/mz window. It could be originating from different places. Could you please send me the following information in order to narrow it down:

  • What peak picking tool/algo did you use to generate the feature table (XCMS or MZmine)?
  • Do the samples used for training include blank samples?
  • Did you manually modify the feature table before loading it into NeatMS? e.g. Removing samples to have a smaller dataset for the training.
  • Where did you install NeatMS from (Bioconda or pip)?

If you can share your notebook and data with me it would definitely be easier for me to reproduce and fix. This type of error only happened during the early stage of development and is now handled by data sanity checks, so there is either a small problem with the input data or a sanity check is missing (maybe both). You can use my email address attached to my GitHub account to share the data.

Thank you for your rapid response! This actually solved my problem. I was playing around with the parameters and had set the min_scan_num = 0. When I changed it back to 5, this problem went away. It might be good to not allow min_scan_num to be zero to maintain compatibility with the labeling tool. Thanks again for the response, I have enjoyed using your tool thus far!