PayneLab/cptac

discrepancy between downloaded datasets and issue downloading Colon data through R markdown...

Closed this issue · 3 comments

I have been downloading transcriptomics and proteomics for all the cancers available and have noticed a few issues (I am running these python commands in R markdown):

  1. I am noticing the transcriptomics datasets downloaded for cancer types can be drastically different. For example, the Brca transcriptomics dataset has negative numbers and Na values. The Ccrcc transcriptomics dataset has all positive numbers and no Na values. Here is my code and don't believe I am doing anything wrong. I am assuming that data is log2(RSEM+1) as well (please correct me if I am wrong):

#Brca data
Brca = cptac.Brca()
Brca_proteomics = Brca.get_proteomics()
Brca_transcriptomics = Brca.get_transcriptomics()

#Ccrcc data
Ccrcc = cptac.Ccrcc()
Ccrcc_proteomics = Ccrcc.get_proteomics()
Ccrcc_transcriptomics = Ccrcc.get_transcriptomics()

  1. The Colon data isn't being accessed properly and I am running into this issue:

Colon = cptac.Colon()

Traceback (most recent call last):
File "", line 1, in
File "/home/cara/.local/share/r-miniconda/envs/cptac/lib/python3.9/site-packages/cptac/colon.py", line 140, in init
prot_combined = prot_tumor.append(prot_normal)
File "/home/cara/.local/share/r-miniconda/envs/cptac/lib/python3.9/site-packages/pandas/core/generic.py", line 5989, in getattr
return object.getattribute(self, name)
AttributeError: 'DataFrame' object has no attribute 'append'

We are confused about the different transcriptomic data types from cancer type to cancer type and if there was any normalization applied. Any information would be greatly appreciated, thanks!

Hello @cabecunas !

  1. I've tried to replicate the error, but calling Brca.get_transcriptomics() gives me a dataframe with no negative values. Perhaps you are using an older version?
  2. Calling cptac.Colon() has been depreciated. The way to access the colorectal data is cptac.Coad(), which stands for "Colon Adenocarcinoma". A list of the cancer abbreviations with their full meanings can be fpuond by calling cptac.get_cancer_info().

Does that answer your questions?

bm600 commented

What version of pandas and cptac are you using? The newest version (v.1.5) has just been uploaded to PyPi and github. Let us know if that fixes the errors, and sorry for taking a while to get back to you we were just wrapping up the new version release.