bigbio/quantms

mztab exporter failing for big dataset `PXD030304`

Closed this issue · 1 comments

Description of the bug


nf-core/quantms execution completed unsuccessfully!

The exit status of the task that caused the workflow execution to fail was: 1.

The full error message was:

Error executing process > 'NFCORE_QUANTMS:QUANTMS:DIA:DIANNCONVERT (PXD030304.sdrf)'

Caused by:
  Process `NFCORE_QUANTMS:QUANTMS:DIA:DIANNCONVERT (PXD030304.sdrf)` terminated with an error exit status (1)

Command executed:

  diann_convert.py convert \
      --folder ./ \
      --exp_design PXD030304.sdrf_openms_design.tsv \
      --diann_version ./version/versions.yml \
      --dia_params "40.0;ppm;40.0;ppm;Trypsin;Carbamidomethyl (C);" \
      --charge 4 \
      --missed_cleavages 1 \
      --qvalue_threshold 0.01 \
      2>&1 | tee convert_report.log
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_QUANTMS:QUANTMS:DIA:DIANNCONVERT":
      pyopenms: $(pip show pyopenms | grep "Version" | awk -F ': ' '{print $2}')
  END_VERSIONS

Command exit status:
  1

Command output:
    out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-assay_refs"] = ",".join(study_variable)
  /hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:512: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value 'assay[6837],assay[6838],assay[6839],assay[6840],assay[6837]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
    out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-assay_refs"] = ",".join(study_variable)
  /hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:513: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
    out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-description"] = "no description given"
  /hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:513: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value 'no description given' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
    out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-description"] = "no description given"
  2023-11-04 18:05:44,988 [mztab_PRH] - Constructing PRH sub-table...
  2023-11-04 18:05:44,988 [mztab_PRH] - Input report shape: (240052070, 23), input pg shape: (8008, 6867), input index_ref shape: (6862, 6), input fasta_df shape: (20686, 3)
  2023-11-04 18:05:47,789 [mztab_PRH] - Classifying results type ...
  2023-11-04 18:05:47,948 [mztab_PRH] - Extracting accession values (keeping first)...
  Warning: OPENMS_DATA_PATH environment variable not found and no share directory was installed. Some functionality might not work as expected.
  Traceback (most recent call last):
    File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 1333, in 
      cli()
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
      return self.main(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1078, in main
      rv = self.invoke(ctx)
           ^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
      return _process_result(sub_ctx.command.invoke(sub_ctx))
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
      return ctx.invoke(self.callback, **ctx.params)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
      return __callback(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func
      return f(get_current_context(), *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 144, in convert
      diann_directory.convert_to_mztab(
    File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 310, in convert_to_mztab
      PRH = mztab_PRH(report, pg, index_ref, database, fasta_df)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 598, in mztab_PRH
      out_mztab_PRH = pd.concat([out_mztab_PRH, protein_details_df]).reset_index(drop=True)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/pandas/core/reshape/concat.py", line 393, in concat
      return op.get_result()
             ^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/pandas/core/reshape/concat.py", line 676, in get_result
      indexers[ax] = obj_labels.get_indexer(new_labels)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 3874, in get_indexer
      raise InvalidIndexError(self._requires_unique_msg)
  pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects

Command wrapper:
    out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-assay_refs"] = ",".join(study_variable)
  /hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:512: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value 'assay[6837],assay[6838],assay[6839],assay[6840],assay[6837]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
    out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-assay_refs"] = ",".join(study_variable)
  /hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:513: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
    out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-description"] = "no description given"
  /hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:513: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value 'no description given' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
    out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-description"] = "no description given"
  2023-11-04 18:05:44,988 [mztab_PRH] - Constructing PRH sub-table...
  2023-11-04 18:05:44,988 [mztab_PRH] - Input report shape: (240052070, 23), input pg shape: (8008, 6867), input index_ref shape: (6862, 6), input fasta_df shape: (20686, 3)
  2023-11-04 18:05:47,789 [mztab_PRH] - Classifying results type ...
  2023-11-04 18:05:47,948 [mztab_PRH] - Extracting accession values (keeping first)...
  Warning: OPENMS_DATA_PATH environment variable not found and no share directory was installed. Some functionality might not work as expected.
  Traceback (most recent call last):
    File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 1333, in 
      cli()
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
      return self.main(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1078, in main
      rv = self.invoke(ctx)
           ^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
      return _process_result(sub_ctx.command.invoke(sub_ctx))
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
      return ctx.invoke(self.callback, **ctx.params)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
      return __callback(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func
      return f(get_current_context(), *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 144, in convert
      diann_directory.convert_to_mztab(
    File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 310, in convert_to_mztab
      PRH = mztab_PRH(report, pg, index_ref, database, fasta_df)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 598, in mztab_PRH
      out_mztab_PRH = pd.concat([out_mztab_PRH, protein_details_df]).reset_index(drop=True)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/pandas/core/reshape/concat.py", line 393, in concat
      return op.get_result()
             ^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/pandas/core/reshape/concat.py", line 676, in get_result
      indexers[ax] = obj_labels.get_indexer(new_labels)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 3874, in get_indexer
      raise InvalidIndexError(self._requires_unique_msg)
  pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects

Work dir:
  /hps/nobackup/juan/pride/reanalysis/absolute-expression/cell-lines/PXD030304/work/85/155baa81b4a6aa41867b31ddec1f9e

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

``

### Command used and terminal output

_No response_

### Relevant files

_No response_

### System information

_No response_

I will close this issue. In favor of bigbio/quantms.io#31