vdemichev/diann-rpackage

Is there any way to get Protein Names and Gene Names along with the Protein.Group ID?

Opened this issue · 6 comments

Hi Dr. Vdemichev,
I was able to analyze the DIA-NN output using your R package. However, it only contains the Uniprot ID and correspondence value for each sample. I also want corresponding Protein Names and Gene names. Is there any way to get them using your package?
Thank you.
Best Regards,
Saleh

Hi Saleh,

This likely means DIA-NN was analysing without a UniProt FASTA supplied.
However you can use some R package (not diann R package) to load a FASTA and match protein and gene names to Uniprot IDs

Best,
Vadim

Hi Saleh,

This likely means DIA-NN was analysing without a UniProt FASTA supplied. However you can use some R package (not diann R package) to load a FASTA and match protein and gene names to Uniprot IDs

Best, Vadim

Thank you Dr. Vdemichev for the prompt response. I have first generated the library free based spectral lib using uniorot FASTA and then used it for analyzing the data. I also get protein Names and Gene names in the main report. But your R package only gives the matrix of uniorot ID and values. So are you recommending to use the FASTA file instead of spectral lib? Or should I use both? Thank you.

Oh, I see. You need to change the column name used for data frame -> matrix transformation. You can see the specifications of the diann package functions by typing '?' before them, like:
?diann_matrix
?diann_maxlfq

Hi Vadim,

I just wanted to follow up on this; from my understanding you can change the group.header variable to change how the column names are generated - e.g. Protein.Group or Protein.Names. If I was to use Protein.Group as the column header, for example, is there any way to generate an extra column in the matrix which contains the Protein.Names as well?

All the best,
Tess

Hi Tess,

This is easy indeed. Suppose you generated a matrix 'mat' from the dataframe 'df', and the row names of 'mat' correspond to the Protein.Group column. Then just do
mat$Names = df$Protein.Names[match(rownames(mat), df$Protein.Group)]

Best,
Vadim

Hi Vadim,

That worked perfectly, thank you very much!

All the best,
Tess