UUIDtoBarcode
SerifatAdebola opened this issue · 8 comments
When I run UUIDtoBarcode
ISSUE 1 with file_id I end up with twice the dataframe size i.e two barcodes per file id
ISSUE 2 with case_id the column for case_id return
Can you provide a minimally reproducible example?
Can you explain what you mean with issue 2?
Thanks.
Best regards,
Marcel
Hi, Attached are the text files that have the necessary information. Sorry I had a typo with Issue 2: when i run UUIDtoBarcode with case_id the column for case_id returns .
fileUUIDresult.txt
caseID.txt
fileID.txt
barcodesUUIDresult.txt
fileUUIDresult.txt - UUIDtoBarcode with file ID result
caseID.txt - Case ID
fileID.txt -File ID
barcodesUUIDresult.txt - UUIDtoBarcode with case ID result
Please provide the R code with a minimally reproducible example.
https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
Best,
Marcel
Hi Marcel,
here is a minimally reproducible sample
Code :
R version 4.0.2 (2020-06-22)
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("TCGAutils")
library(TCGAutils)
file= read.table("fileidtest.txt", sep=“\t")
data2= UUIDtoBarcode(file,from_type = "file_id")
fileidtest.txt
Here's a more minimal reproducible example:
library(TCGAutils)
UUIDtoBarcode("01ef8a08-1de5-4ceb-be51-979418465f1a",from_type = "file_id")
#> file_id associated_entities.entity_submitter_id
#> 1 01ef8a08-1de5-4ceb-be51-979418465f1a TCGA-EL-A4JX-11A-11D-A259-01
#> 2 01ef8a08-1de5-4ceb-be51-979418465f1a TCGA-EL-A4JX-01A-12D-A256-01
Created on 2021-07-26 by the reprex package (v2.0.0)
In this example it looks like the UUID is associated with a patient (TCGA-EL-A4JX) for which there are two types of specimens (01A and 11A). See https://docs.gdc.cancer.gov/Encyclopedia/pages/TCGA_Barcode/.
UUIDtoBarcode just calls the GDC API (https://docs.gdc.cancer.gov/API/Users_Guide/Search_and_Retrieval/), so the GDC help would be better able to answer questions about how TCGA assigned UUIDs to aliquots, specimens, patients, etc (it seems complicated and I don't totally understand it myself!)