waldronlab/TCGAutils

UUIDtoBarcode

SerifatAdebola opened this issue · 8 comments

When I run UUIDtoBarcode
ISSUE 1 with file_id I end up with twice the dataframe size i.e two barcodes per file id
ISSUE 2 with case_id the column for case_id return

Hi @SerifatAdebola

Can you provide a minimally reproducible example?
Can you explain what you mean with issue 2?
Thanks.

Best regards,
Marcel

Hi, Attached are the text files that have the necessary information. Sorry I had a typo with Issue 2: when i run UUIDtoBarcode with case_id the column for case_id returns .
fileUUIDresult.txt
caseID.txt
fileID.txt
barcodesUUIDresult.txt

fileUUIDresult.txt - UUIDtoBarcode with file ID result
caseID.txt - Case ID
fileID.txt -File ID
barcodesUUIDresult.txt - UUIDtoBarcode with case ID result

Please provide the R code with a minimally reproducible example.

https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

Best,
Marcel

Hi Marcel,
here is a minimally reproducible sample
Code :
R version 4.0.2 (2020-06-22)

if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")

BiocManager::install("TCGAutils")

library(TCGAutils)
file= read.table("fileidtest.txt", sep=“\t")
data2= UUIDtoBarcode(file,from_type = "file_id")
fileidtest.txt

Here's a more minimal reproducible example:

library(TCGAutils)
UUIDtoBarcode("01ef8a08-1de5-4ceb-be51-979418465f1a",from_type = "file_id")
#>                                file_id associated_entities.entity_submitter_id
#> 1 01ef8a08-1de5-4ceb-be51-979418465f1a            TCGA-EL-A4JX-11A-11D-A259-01
#> 2 01ef8a08-1de5-4ceb-be51-979418465f1a            TCGA-EL-A4JX-01A-12D-A256-01

Created on 2021-07-26 by the reprex package (v2.0.0)

In this example it looks like the UUID is associated with a patient (TCGA-EL-A4JX) for which there are two types of specimens (01A and 11A). See https://docs.gdc.cancer.gov/Encyclopedia/pages/TCGA_Barcode/.

UUIDtoBarcode just calls the GDC API (https://docs.gdc.cancer.gov/API/Users_Guide/Search_and_Retrieval/), so the GDC help would be better able to answer questions about how TCGA assigned UUIDs to aliquots, specimens, patients, etc (it seems complicated and I don't totally understand it myself!)

Thanks Levi! @lwaldron
I've also fixed a bug where the file_id s were not in the right order.
66f15d5