PayneLab/cptac

Why your cptac proteome data's value of a gene differential from the tmt10.tsv from CPTAC offical website

Closed this issue · 6 comments

hello, i have a question, i want to know why your cptac proteome data's value of a gene differential from the tmt10.tsv from CPTAC offical website

I’m guessing you got the data set from the data coordinating center, a website hosted at Georgetown. The reason those are different, is that the data coordinating center uses a slightly different version of the data than was published on. My python API distributes the data as published. The differences are very minor just some extra data cleaning.

I’m guessing you got the data set from the data coordinating center, a website hosted at Georgetown. The reason those are different, is that the data coordinating center uses a slightly different version of the data than was published on. My python API distributes the data as published. The differences are very minor just some extra data cleaning.

thanks your work, i download the data from CPTAC official website (https://proteomic.datacommons.cancer.gov/pdc/) , i found the value of a gene in a sample is different form your data and CPTAC data
image

I’m guessing you got the data set from the data coordinating center, a website hosted at Georgetown. The reason those are different, is that the data coordinating center uses a slightly different version of the data than was published on. My python API distributes the data as published. The differences are very minor just some extra data cleaning.

i see the data in CPTAC have 'Log Ratio' and ' Unshared Log Ratio' ,and the value is different from your data in same gene adn sample

image

'Log Ratio' includes shared peptides, 'Unshared Log Ratio' does not. It looks like you retrieve this from the PDC. The data analysis pipeline there would be different that what was used to generate the data hosted by the PayneLab. The former is used for continuity and the latter is specific to the publication and according to the lab publishing the experiment. Paul A. Rudnick, Ph.D. President & Co-Founder Spectragen Informatics LLC business: (206) 842-4980 email: @.*** web: http://www.spectragen.com

On Tue, Jun 14, 2022 at 3:12 AM ZihaoXing @.> wrote: I’m guessing you got the data set from the data coordinating center, a website hosted at Georgetown. The reason those are different, is that the data coordinating center uses a slightly different version of the data than was published on. My python API distributes the data as published. The differences are very minor just some extra data cleaning. i see the data in CPTAC have 'Log Ratio' and ' Unshared Log Ratio' ,and the value is different from your data in same gene adn sample [image: image] https://user-images.githubusercontent.com/58220227/173553117-8dd911e0-19dd-4c9d-b6c6-9fd16f16c083.png — Reply to this email directly, view it on GitHub <#26 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AASHAXBYVGMWTQOHI2KO3WTVPBLJ3ANCNFSM5YM5YKCA . You are receiving this because you are subscribed to this thread.Message ID: @.>

Thanks, yes, Firstly, I downloaded the proteome data set from the CPTAC PDC, i use the 'Unshared Log Ratio' for proteomics data, but the value of 'Unshared Log Ratio' is different form your data, So I'm a little confused. I'm not sure which data is better.

I really hesitate to say one is 'better'. There are many different data analysis pipelines, each with pros and cons. However, the PDC (which Paul hosts) and my Python package are indeed different. I am hosting the final dataset used in publication. Paul is hosting a dataset which has gone through a uniform processing pipeline which is consistent across cancers.