vdemichev/DiaNN

Missing columns in parquet, DIA-NN GUI suggestions, and timsTOF diaPASEF data

bd-mass-spec opened this issue · 2 comments

The parquet file seems to be missing a couple of columns from the report tsv including "File.Name" and "First.Protein.Description". Could these be added to the parquet file?

Could there be an update in the future that would allow saving and loading settings (kind of like FragPipe workflows) into the GUI?

I noticed in the log file it recommends generating in silico spectral libraries and then the analysis in separate workflows but I couldn't find any explanation in the DIA-NN github page (besides obvious speed benefits if processing a lot of similar experiments from the same preparation.

For timsTOF diaPASEF data, do you have any specific commands or options that you explicitly recommend? For mass and MS1 accuracies I set both to 12.0 and scan window to 8 after doing initial analysis across many experiments. Attached is my log file (spectral library generated from using same parameters as in this analysis). Do you recommend changing any of these parameters?

Finally, I have been having various issues with Skyline. I installed the Skyline daily administrator version and DIA-NN recognizes and finds Skyline (24.0.9.184). I click the Skyline button after the analysis. The attached commands run but then I press any key to continue and nothing happens.
DIANN_v1_report.log.txt
skyline_commands

Hi,

Yes, it's intentional those columns are not in the .parquet report, the primary reason is to reduce RAM consumption when loading it in R.

If you need File.Name, can use the .tsv report.
For First.Protein.Description, can use the .protein_description.tsv output file that is being generated when matrix output is activated (--matrices).

Could there be an update in the future that would allow saving and loading settings (kind of like FragPipe workflows) into the GUI?

This is already supported, please also see the pipeline functionality in DIA-NN. Btw, you can always put certain settings into a config file and reference multiple config files with --cfg, if you are using DIA-NN under Linux.

I noticed in the log file it recommends generating in silico spectral libraries and then the analysis in separate workflows but I couldn't find any explanation in the DIA-NN github page (besides obvious speed benefits if processing a lot of similar experiments from the same preparation.

There is just no benefit in doing otherwise.

For timsTOF diaPASEF data, do you have any specific commands or options that you explicitly recommend?

Just the mass accuracies in the range 10-15.

scan window to 8

Depends on the number of dp/peak in your data.

Skyline tries to automatically determine the main .tsv report file name based on the speclib name. If you use the automatically generated spectral library name (report-lib.parquet), then it should work fine. We are working with the Skyline team and will address this caveat in future versions.

Best,
Vadim

RE: missing columns, it seems fine to discard descriptions, but the parquet file is also missing columns like MS2.Scan, Fragment.Quant.Raw, which is conceivably necessary for downstream analysis. Is this also intended?