mztab exporter failing for big dataset `PXD030304`
Closed this issue · 1 comments
ypriverol commented
Description of the bug
nf-core/quantms execution completed unsuccessfully!
The exit status of the task that caused the workflow execution to fail was: 1.
The full error message was:
Error executing process > 'NFCORE_QUANTMS:QUANTMS:DIA:DIANNCONVERT (PXD030304.sdrf)'
Caused by:
Process `NFCORE_QUANTMS:QUANTMS:DIA:DIANNCONVERT (PXD030304.sdrf)` terminated with an error exit status (1)
Command executed:
diann_convert.py convert \
--folder ./ \
--exp_design PXD030304.sdrf_openms_design.tsv \
--diann_version ./version/versions.yml \
--dia_params "40.0;ppm;40.0;ppm;Trypsin;Carbamidomethyl (C);" \
--charge 4 \
--missed_cleavages 1 \
--qvalue_threshold 0.01 \
2>&1 | tee convert_report.log
cat <<-END_VERSIONS > versions.yml
"NFCORE_QUANTMS:QUANTMS:DIA:DIANNCONVERT":
pyopenms: $(pip show pyopenms | grep "Version" | awk -F ': ' '{print $2}')
END_VERSIONS
Command exit status:
1
Command output:
out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-assay_refs"] = ",".join(study_variable)
/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:512: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value 'assay[6837],assay[6838],assay[6839],assay[6840],assay[6837]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-assay_refs"] = ",".join(study_variable)
/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:513: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-description"] = "no description given"
/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:513: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value 'no description given' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-description"] = "no description given"
2023-11-04 18:05:44,988 [mztab_PRH] - Constructing PRH sub-table...
2023-11-04 18:05:44,988 [mztab_PRH] - Input report shape: (240052070, 23), input pg shape: (8008, 6867), input index_ref shape: (6862, 6), input fasta_df shape: (20686, 3)
2023-11-04 18:05:47,789 [mztab_PRH] - Classifying results type ...
2023-11-04 18:05:47,948 [mztab_PRH] - Extracting accession values (keeping first)...
Warning: OPENMS_DATA_PATH environment variable not found and no share directory was installed. Some functionality might not work as expected.
Traceback (most recent call last):
File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 1333, in
cli()
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 144, in convert
diann_directory.convert_to_mztab(
File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 310, in convert_to_mztab
PRH = mztab_PRH(report, pg, index_ref, database, fasta_df)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 598, in mztab_PRH
out_mztab_PRH = pd.concat([out_mztab_PRH, protein_details_df]).reset_index(drop=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/core/reshape/concat.py", line 393, in concat
return op.get_result()
^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/core/reshape/concat.py", line 676, in get_result
indexers[ax] = obj_labels.get_indexer(new_labels)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 3874, in get_indexer
raise InvalidIndexError(self._requires_unique_msg)
pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects
Command wrapper:
out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-assay_refs"] = ",".join(study_variable)
/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:512: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value 'assay[6837],assay[6838],assay[6839],assay[6840],assay[6837]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-assay_refs"] = ",".join(study_variable)
/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:513: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-description"] = "no description given"
/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:513: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value 'no description given' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-description"] = "no description given"
2023-11-04 18:05:44,988 [mztab_PRH] - Constructing PRH sub-table...
2023-11-04 18:05:44,988 [mztab_PRH] - Input report shape: (240052070, 23), input pg shape: (8008, 6867), input index_ref shape: (6862, 6), input fasta_df shape: (20686, 3)
2023-11-04 18:05:47,789 [mztab_PRH] - Classifying results type ...
2023-11-04 18:05:47,948 [mztab_PRH] - Extracting accession values (keeping first)...
Warning: OPENMS_DATA_PATH environment variable not found and no share directory was installed. Some functionality might not work as expected.
Traceback (most recent call last):
File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 1333, in
cli()
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 144, in convert
diann_directory.convert_to_mztab(
File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 310, in convert_to_mztab
PRH = mztab_PRH(report, pg, index_ref, database, fasta_df)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 598, in mztab_PRH
out_mztab_PRH = pd.concat([out_mztab_PRH, protein_details_df]).reset_index(drop=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/core/reshape/concat.py", line 393, in concat
return op.get_result()
^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/core/reshape/concat.py", line 676, in get_result
indexers[ax] = obj_labels.get_indexer(new_labels)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 3874, in get_indexer
raise InvalidIndexError(self._requires_unique_msg)
pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects
Work dir:
/hps/nobackup/juan/pride/reanalysis/absolute-expression/cell-lines/PXD030304/work/85/155baa81b4a6aa41867b31ddec1f9e
Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
``
### Command used and terminal output
_No response_
### Relevant files
_No response_
### System information
_No response_
ypriverol commented
I will close this issue. In favor of bigbio/quantms.io#31