bigbio/quantms

Error converting to diann to mztab

Closed this issue · 3 comments

Description of the bug


nf-core/quantms execution completed unsuccessfully!

The exit status of the task that caused the workflow execution to fail was: 1.

The full error message was:

Error executing process > 'NFCORE_QUANTMS:QUANTMS:DIA:DIANNCONVERT (PXD017052-DIA.sdrf)'

Caused by:
  Process `NFCORE_QUANTMS:QUANTMS:DIA:DIANNCONVERT (PXD017052-DIA.sdrf)` terminated with an error exit status (1)

Command executed:

  diann_convert.py convert \
      --folder ./ \
      --exp_design PXD017052-DIA.sdrf_openms_design.tsv \
      --diann_version ./version/versions.yml \
      --dia_params "20.0;ppm;4.5;ppm;Trypsin/P;Carbamidomethyl (C);Acetyl (Protein N-term),Oxidation (M)" \
      --charge 4 \
      --missed_cleavages 1 \
      --qvalue_threshold 0.01 \
      2>&1 | tee convert_report.log
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_QUANTMS:QUANTMS:DIA:DIANNCONVERT":
      pyopenms: $(pip show pyopenms | grep "Version" | awk -F ': ' '{print $2}')
  END_VERSIONS

Command exit status:
  1

Command output:
  1              2        1  EXP19072_2019v1ms090X2_A.wiff     1      2  EXP19072_2019v1ms090X2_A
  2              3        1  EXP19072_2019v1ms090X3_A.wiff     1      3  EXP19072_2019v1ms090X3_A
  3              4        1  EXP19072_2019v1ms090X4_A.wiff     1      4  EXP19072_2019v1ms090X4_A
  4              5        1  EXP19072_2019v1ms090X5_A.wiff     1      5  EXP19072_2019v1ms090X5_A
  2023-12-06 10:09:28,171 [convert] - 
  
  s_DataFrame ((3140, 3))>>>
  2023-12-06 10:09:28,171 [convert] -   Sample        MSstats_Condition MSstats_BioReplicate
  0      1      Blood Plasma|normal                    1
  1      2  Blood Plasma|Late NSCLC                    2
  2      3   Blood Plasma|Co-morbid                    3
  3      4      Blood Plasma|normal                    4
  4      5   Blood Plasma|Co-morbid                    5
  2023-12-06 10:09:28,172 [convert] - Adding Fraction, BioReplicate, Condition columns
  Warning: OPENMS_DATA_PATH environment variable not found and no share directory was installed. Some functionality might not work as expected.
  Traceback (most recent call last):
    File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 1333, in 
      cli()
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
      return self.main(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1078, in main
      rv = self.invoke(ctx)
           ^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
      return _process_result(sub_ctx.command.invoke(sub_ctx))
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
      return ctx.invoke(self.callback, **ctx.params)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
      return __callback(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func
      return f(get_current_context(), *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 115, in convert
      out_msstats = out_msstats.merge(
                    ^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/pandas/core/frame.py", line 10490, in merge
      return merge(
             ^^^^^^
    File "/usr/local/lib/python3.11/site-packages/pandas/core/reshape/merge.py", line 169, in merge
      op = _MergeOperation(
           ^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/pandas/core/reshape/merge.py", line 810, in __init__
      self._validate_validate_kwd(validate)
    File "/usr/local/lib/python3.11/site-packages/pandas/core/reshape/merge.py", line 1635, in _validate_validate_kwd
      raise MergeError(
  pandas.errors.MergeError: Merge keys are not unique in right dataset; not a many-to-one merge

Command wrapper:
  1              2        1  EXP19072_2019v1ms090X2_A.wiff     1      2  EXP19072_2019v1ms090X2_A
  2              3        1  EXP19072_2019v1ms090X3_A.wiff     1      3  EXP19072_2019v1ms090X3_A
  3              4        1  EXP19072_2019v1ms090X4_A.wiff     1      4  EXP19072_2019v1ms090X4_A
  4              5        1  EXP19072_2019v1ms090X5_A.wiff     1      5  EXP19072_2019v1ms090X5_A
  2023-12-06 10:09:28,171 [convert] - 
  
  s_DataFrame ((3140, 3))>>>
  2023-12-06 10:09:28,171 [convert] -   Sample        MSstats_Condition MSstats_BioReplicate
  0      1      Blood Plasma|normal                    1
  1      2  Blood Plasma|Late NSCLC                    2
  2      3   Blood Plasma|Co-morbid                    3
  3      4      Blood Plasma|normal                    4
  4      5   Blood Plasma|Co-morbid                    5
  2023-12-06 10:09:28,172 [convert] - Adding Fraction, BioReplicate, Condition columns
  Warning: OPENMS_DATA_PATH environment variable not found and no share directory was installed. Some functionality might not work as expected.
  Traceback (most recent call last):
    File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 1333, in 
      cli()
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
      return self.main(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1078, in main
      rv = self.invoke(ctx)
           ^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
      return _process_result(sub_ctx.command.invoke(sub_ctx))
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
      return ctx.invoke(self.callback, **ctx.params)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
      return __callback(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func
      return f(get_current_context(), *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 115, in convert
      out_msstats = out_msstats.merge(
                    ^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/pandas/core/frame.py", line 10490, in merge
      return merge(
             ^^^^^^
    File "/usr/local/lib/python3.11/site-packages/pandas/core/reshape/merge.py", line 169, in merge
      op = _MergeOperation(
           ^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/pandas/core/reshape/merge.py", line 810, in __init__
      self._validate_validate_kwd(validate)
    File "/usr/local/lib/python3.11/site-packages/pandas/core/reshape/merge.py", line 1635, in _validate_validate_kwd
      raise MergeError(
  pandas.errors.MergeError: Merge keys are not unique in right dataset; not a many-to-one merge

Work dir:
  /hps/nobackup/juan/pride/reanalysis/absolute-expression/platelet/PXD017052/work/ee/747a03e90cc15940eaebc4b212f283

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

Command used and terminal output

No response

Relevant files

No response

System information

No response

Because EXP19072_2019v1ms060X17_B.wiff appeared twice in PXD017052-DIA.sdrf. One should be removed.

After the changes, here a new error:

nf-core/quantms execution completed unsuccessfully!

The exit status of the task that caused the workflow execution to fail was: 1.

The full error message was:

Error executing process > 'NFCORE_QUANTMS:QUANTMS:DIA:DIANNCONVERT (PXD017052-DIA.sdrf)'

Caused by:
  Process `NFCORE_QUANTMS:QUANTMS:DIA:DIANNCONVERT (PXD017052-DIA.sdrf)` terminated with an error exit status (1)

Command executed:

  diann_convert.py convert \
      --folder ./ \
      --exp_design PXD017052-DIA.sdrf_openms_design.tsv \
      --diann_version ./version/versions.yml \
      --dia_params "20.0;ppm;4.5;ppm;Trypsin/P;Carbamidomethyl (C);Acetyl (Protein N-term),Oxidation (M)" \
      --charge 4 \
      --missed_cleavages 1 \
      --qvalue_threshold 0.01 \
      2>&1 | tee convert_report.log
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_QUANTMS:QUANTMS:DIA:DIANNCONVERT":
      pyopenms: $(pip show pyopenms | grep "Version" | awk -F ': ' '{print $2}')
  END_VERSIONS

Command exit status:
  1

Command output:
  /hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:512: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
    out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-assay_refs"] = ",".join(study_variable)
  /hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:512: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value 'assay[3139]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
    out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-assay_refs"] = ",".join(study_variable)
  /hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:513: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
    out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-description"] = "no description given"
  /hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:513: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value 'no description given' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
    out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-description"] = "no description given"
  2023-12-15 11:27:18,392 [mztab_PRH] - Constructing PRH sub-table...
  2023-12-15 11:27:18,392 [mztab_PRH] - Input report shape: (43403290, 23), input pg shape: (3005, 3144), input index_ref shape: (3139, 6), input fasta_df shape: (41137, 3)
  2023-12-15 11:27:19,207 [mztab_PRH] - Classifying results type ...
  2023-12-15 11:27:19,235 [mztab_PRH] - Extracting accession values (keeping first)...
  2023-12-15 11:27:19,721 [mztab_PRH] - Calculating protein coverage (bottleneck)...
  Warning: OPENMS_DATA_PATH environment variable not found and no share directory was installed. Some functionality might not work as expected.
  Traceback (most recent call last):
    File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 1333, in 
      cli()
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
      return self.main(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1078, in main
      rv = self.invoke(ctx)
           ^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
      return _process_result(sub_ctx.command.invoke(sub_ctx))
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
      return ctx.invoke(self.callback, **ctx.params)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
      return __callback(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func
      return f(get_current_context(), *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 144, in convert
      diann_directory.convert_to_mztab(
    File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 310, in convert_to_mztab
      PRH = mztab_PRH(report, pg, index_ref, database, fasta_df)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 603, in mztab_PRH
      out_mztab_PRH.loc[:, "protein_coverage"] = calculate_protein_coverages(
                                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 1322, in calculate_protein_coverages
      cov = calculate_coverage(fasta_id_to_seqs[f_id], ids_to_seqs[acc_to_ids[acc]])
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 1261, in calculate_coverage
      starts, lengths = zip(*sorted(zip(starts, lengths)))
      ^^^^^^^^^^^^^^^
  ValueError: not enough values to unpack (expected 2, got 0)

Command wrapper:
  /hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:512: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
    out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-assay_refs"] = ",".join(study_variable)
  /hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:512: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value 'assay[3139]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
    out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-assay_refs"] = ",".join(study_variable)
  /hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:513: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
    out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-description"] = "no description given"
  /hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:513: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value 'no description given' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
    out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-description"] = "no description given"
  2023-12-15 11:27:18,392 [mztab_PRH] - Constructing PRH sub-table...
  2023-12-15 11:27:18,392 [mztab_PRH] - Input report shape: (43403290, 23), input pg shape: (3005, 3144), input index_ref shape: (3139, 6), input fasta_df shape: (41137, 3)
  2023-12-15 11:27:19,207 [mztab_PRH] - Classifying results type ...
  2023-12-15 11:27:19,235 [mztab_PRH] - Extracting accession values (keeping first)...
  2023-12-15 11:27:19,721 [mztab_PRH] - Calculating protein coverage (bottleneck)...
  Warning: OPENMS_DATA_PATH environment variable not found and no share directory was installed. Some functionality might not work as expected.
  Traceback (most recent call last):
    File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 1333, in 
      cli()
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
      return self.main(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1078, in main
      rv = self.invoke(ctx)
           ^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
      return _process_result(sub_ctx.command.invoke(sub_ctx))
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
      return ctx.invoke(self.callback, **ctx.params)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
      return __callback(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func
      return f(get_current_context(), *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 144, in convert
      diann_directory.convert_to_mztab(
    File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 310, in convert_to_mztab
      PRH = mztab_PRH(report, pg, index_ref, database, fasta_df)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 603, in mztab_PRH
      out_mztab_PRH.loc[:, "protein_coverage"] = calculate_protein_coverages(
                                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 1322, in calculate_protein_coverages
      cov = calculate_coverage(fasta_id_to_seqs[f_id], ids_to_seqs[acc_to_ids[acc]])
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 1261, in calculate_coverage
      starts, lengths = zip(*sorted(zip(starts, lengths)))
      ^^^^^^^^^^^^^^^
  ValueError: not enough values to unpack (expected 2, got 0)

Work dir:
  /hps/nobackup/juan/pride/reanalysis/absolute-expression/platelet/PXD017052/work/23/fe1d4b592f0797d99fe67ac3b9efd1

Tip: you can try to figure

Solved.