Error converting to diann to mztab
Closed this issue · 3 comments
ypriverol commented
Description of the bug
nf-core/quantms execution completed unsuccessfully!
The exit status of the task that caused the workflow execution to fail was: 1.
The full error message was:
Error executing process > 'NFCORE_QUANTMS:QUANTMS:DIA:DIANNCONVERT (PXD017052-DIA.sdrf)'
Caused by:
Process `NFCORE_QUANTMS:QUANTMS:DIA:DIANNCONVERT (PXD017052-DIA.sdrf)` terminated with an error exit status (1)
Command executed:
diann_convert.py convert \
--folder ./ \
--exp_design PXD017052-DIA.sdrf_openms_design.tsv \
--diann_version ./version/versions.yml \
--dia_params "20.0;ppm;4.5;ppm;Trypsin/P;Carbamidomethyl (C);Acetyl (Protein N-term),Oxidation (M)" \
--charge 4 \
--missed_cleavages 1 \
--qvalue_threshold 0.01 \
2>&1 | tee convert_report.log
cat <<-END_VERSIONS > versions.yml
"NFCORE_QUANTMS:QUANTMS:DIA:DIANNCONVERT":
pyopenms: $(pip show pyopenms | grep "Version" | awk -F ': ' '{print $2}')
END_VERSIONS
Command exit status:
1
Command output:
1 2 1 EXP19072_2019v1ms090X2_A.wiff 1 2 EXP19072_2019v1ms090X2_A
2 3 1 EXP19072_2019v1ms090X3_A.wiff 1 3 EXP19072_2019v1ms090X3_A
3 4 1 EXP19072_2019v1ms090X4_A.wiff 1 4 EXP19072_2019v1ms090X4_A
4 5 1 EXP19072_2019v1ms090X5_A.wiff 1 5 EXP19072_2019v1ms090X5_A
2023-12-06 10:09:28,171 [convert] -
s_DataFrame ((3140, 3))>>>
2023-12-06 10:09:28,171 [convert] - Sample MSstats_Condition MSstats_BioReplicate
0 1 Blood Plasma|normal 1
1 2 Blood Plasma|Late NSCLC 2
2 3 Blood Plasma|Co-morbid 3
3 4 Blood Plasma|normal 4
4 5 Blood Plasma|Co-morbid 5
2023-12-06 10:09:28,172 [convert] - Adding Fraction, BioReplicate, Condition columns
Warning: OPENMS_DATA_PATH environment variable not found and no share directory was installed. Some functionality might not work as expected.
Traceback (most recent call last):
File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 1333, in
cli()
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 115, in convert
out_msstats = out_msstats.merge(
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/core/frame.py", line 10490, in merge
return merge(
^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/core/reshape/merge.py", line 169, in merge
op = _MergeOperation(
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/core/reshape/merge.py", line 810, in __init__
self._validate_validate_kwd(validate)
File "/usr/local/lib/python3.11/site-packages/pandas/core/reshape/merge.py", line 1635, in _validate_validate_kwd
raise MergeError(
pandas.errors.MergeError: Merge keys are not unique in right dataset; not a many-to-one merge
Command wrapper:
1 2 1 EXP19072_2019v1ms090X2_A.wiff 1 2 EXP19072_2019v1ms090X2_A
2 3 1 EXP19072_2019v1ms090X3_A.wiff 1 3 EXP19072_2019v1ms090X3_A
3 4 1 EXP19072_2019v1ms090X4_A.wiff 1 4 EXP19072_2019v1ms090X4_A
4 5 1 EXP19072_2019v1ms090X5_A.wiff 1 5 EXP19072_2019v1ms090X5_A
2023-12-06 10:09:28,171 [convert] -
s_DataFrame ((3140, 3))>>>
2023-12-06 10:09:28,171 [convert] - Sample MSstats_Condition MSstats_BioReplicate
0 1 Blood Plasma|normal 1
1 2 Blood Plasma|Late NSCLC 2
2 3 Blood Plasma|Co-morbid 3
3 4 Blood Plasma|normal 4
4 5 Blood Plasma|Co-morbid 5
2023-12-06 10:09:28,172 [convert] - Adding Fraction, BioReplicate, Condition columns
Warning: OPENMS_DATA_PATH environment variable not found and no share directory was installed. Some functionality might not work as expected.
Traceback (most recent call last):
File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 1333, in
cli()
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 115, in convert
out_msstats = out_msstats.merge(
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/core/frame.py", line 10490, in merge
return merge(
^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/core/reshape/merge.py", line 169, in merge
op = _MergeOperation(
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/core/reshape/merge.py", line 810, in __init__
self._validate_validate_kwd(validate)
File "/usr/local/lib/python3.11/site-packages/pandas/core/reshape/merge.py", line 1635, in _validate_validate_kwd
raise MergeError(
pandas.errors.MergeError: Merge keys are not unique in right dataset; not a many-to-one merge
Work dir:
/hps/nobackup/juan/pride/reanalysis/absolute-expression/platelet/PXD017052/work/ee/747a03e90cc15940eaebc4b212f283
Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`
Command used and terminal output
No response
Relevant files
No response
System information
No response
daichengxin commented
Because EXP19072_2019v1ms060X17_B.wiff
appeared twice in PXD017052-DIA.sdrf. One should be removed.
ypriverol commented
After the changes, here a new error:
nf-core/quantms execution completed unsuccessfully!
The exit status of the task that caused the workflow execution to fail was: 1.
The full error message was:
Error executing process > 'NFCORE_QUANTMS:QUANTMS:DIA:DIANNCONVERT (PXD017052-DIA.sdrf)'
Caused by:
Process `NFCORE_QUANTMS:QUANTMS:DIA:DIANNCONVERT (PXD017052-DIA.sdrf)` terminated with an error exit status (1)
Command executed:
diann_convert.py convert \
--folder ./ \
--exp_design PXD017052-DIA.sdrf_openms_design.tsv \
--diann_version ./version/versions.yml \
--dia_params "20.0;ppm;4.5;ppm;Trypsin/P;Carbamidomethyl (C);Acetyl (Protein N-term),Oxidation (M)" \
--charge 4 \
--missed_cleavages 1 \
--qvalue_threshold 0.01 \
2>&1 | tee convert_report.log
cat <<-END_VERSIONS > versions.yml
"NFCORE_QUANTMS:QUANTMS:DIA:DIANNCONVERT":
pyopenms: $(pip show pyopenms | grep "Version" | awk -F ': ' '{print $2}')
END_VERSIONS
Command exit status:
1
Command output:
/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:512: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-assay_refs"] = ",".join(study_variable)
/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:512: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value 'assay[3139]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-assay_refs"] = ",".join(study_variable)
/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:513: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-description"] = "no description given"
/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:513: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value 'no description given' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-description"] = "no description given"
2023-12-15 11:27:18,392 [mztab_PRH] - Constructing PRH sub-table...
2023-12-15 11:27:18,392 [mztab_PRH] - Input report shape: (43403290, 23), input pg shape: (3005, 3144), input index_ref shape: (3139, 6), input fasta_df shape: (41137, 3)
2023-12-15 11:27:19,207 [mztab_PRH] - Classifying results type ...
2023-12-15 11:27:19,235 [mztab_PRH] - Extracting accession values (keeping first)...
2023-12-15 11:27:19,721 [mztab_PRH] - Calculating protein coverage (bottleneck)...
Warning: OPENMS_DATA_PATH environment variable not found and no share directory was installed. Some functionality might not work as expected.
Traceback (most recent call last):
File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 1333, in
cli()
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 144, in convert
diann_directory.convert_to_mztab(
File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 310, in convert_to_mztab
PRH = mztab_PRH(report, pg, index_ref, database, fasta_df)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 603, in mztab_PRH
out_mztab_PRH.loc[:, "protein_coverage"] = calculate_protein_coverages(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 1322, in calculate_protein_coverages
cov = calculate_coverage(fasta_id_to_seqs[f_id], ids_to_seqs[acc_to_ids[acc]])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 1261, in calculate_coverage
starts, lengths = zip(*sorted(zip(starts, lengths)))
^^^^^^^^^^^^^^^
ValueError: not enough values to unpack (expected 2, got 0)
Command wrapper:
/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:512: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-assay_refs"] = ",".join(study_variable)
/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:512: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value 'assay[3139]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-assay_refs"] = ",".join(study_variable)
/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:513: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-description"] = "no description given"
/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:513: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value 'no description given' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-description"] = "no description given"
2023-12-15 11:27:18,392 [mztab_PRH] - Constructing PRH sub-table...
2023-12-15 11:27:18,392 [mztab_PRH] - Input report shape: (43403290, 23), input pg shape: (3005, 3144), input index_ref shape: (3139, 6), input fasta_df shape: (41137, 3)
2023-12-15 11:27:19,207 [mztab_PRH] - Classifying results type ...
2023-12-15 11:27:19,235 [mztab_PRH] - Extracting accession values (keeping first)...
2023-12-15 11:27:19,721 [mztab_PRH] - Calculating protein coverage (bottleneck)...
Warning: OPENMS_DATA_PATH environment variable not found and no share directory was installed. Some functionality might not work as expected.
Traceback (most recent call last):
File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 1333, in
cli()
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 144, in convert
diann_directory.convert_to_mztab(
File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 310, in convert_to_mztab
PRH = mztab_PRH(report, pg, index_ref, database, fasta_df)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 603, in mztab_PRH
out_mztab_PRH.loc[:, "protein_coverage"] = calculate_protein_coverages(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 1322, in calculate_protein_coverages
cov = calculate_coverage(fasta_id_to_seqs[f_id], ids_to_seqs[acc_to_ids[acc]])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 1261, in calculate_coverage
starts, lengths = zip(*sorted(zip(starts, lengths)))
^^^^^^^^^^^^^^^
ValueError: not enough values to unpack (expected 2, got 0)
Work dir:
/hps/nobackup/juan/pride/reanalysis/absolute-expression/platelet/PXD017052/work/23/fe1d4b592f0797d99fe67ac3b9efd1
Tip: you can try to figure
ypriverol commented
Solved.