/curatedBreastCancer

15 studies encompassing 24 different datasets (matrices) from GEO breast cancer gene expression experiments. Treatment information and outcomes data is also curated for each patient (datasets were carefully chosen based upon whether I could determine each sample/patient's treatment regimen, and whether the samples had at least one outcomes variable measured.) This totals over 2700 samples all together from large-scale studies and clinical trials. These datasets are available as an R package, curatedBreastData.

Primary LanguageR

curatedBreastCancer

-NOTE: as of 2016, I have an updated script for the gene expression processing. Please see the breastProcessAndGeneFeatures_script.R in the Concide repo for accessing this new updated processing function. The same funcitons are indeed on the curatedBreastData repo, but not in the Bioconductor package, as I am in the process of updating this package submission. I plan to remove the processing functions from the curatedBreastData package and just point users to the Coincide package, as Bioconductor encourages developers to keep code out of database packages exactly to avoid this issue of trying to update code quicker.

-The only difference is a very small bug that crashes on one dataset (so it won't produce incorrect output, it just stops) as it was a minor ,drop=FALSE indexing issue. +I released a slightly updated processing code in Bioconductor 3.4 in the fall of 2016. The main difference is a very small bug that crashes on one dataset (so it won't produce incorrect output, it just stops) as it was a minor ,drop=FALSE indexing issue. I am in the process of fixing the DFS_months_or_MIN months variables for GSE16446, as it is in days, not months (you can just divide the days by 28 to get months, but this is only needed for this one specific study.)

The GSE.... R files are mainly very early records of how I processed these files, but are not intended to be directly re-run. However, I do have notes on the different datasets if you need more specifics than is provided in the corresponding publications, https://www.ncbi.nlm.nih.gov/pubmed/24303324 and https://www.ncbi.nlm.nih.gov/pubmed/26961683 (the latter has some more details in the supplementary methods section).