Inquiry about Dataset
bryanwong17 opened this issue · 4 comments
Hi Richard,
Could you please let me know how to download the same datasets as yours? Additionally, I tried git cloning your repository, but I couldn't unzip all zip folders in HIPT/2-Weakly-Supervised-Subtyping/dataset_csv. What is the procedure for unzipping it? Thank you
Hi @bryanwong17 - I perhaps saved the CSV files in an unconventional way (saved previously for storing the count matrices / RNA-seq abundances for all 20K-ish genes). You can open the dataset CSV files via pandas
.
Hi @Richarizardd,
Thanks for your fast reply. I was able to open the dataset CSV files and see all slide_ids. Moreover, I found CSV files for the 10 splits. If I want to download WSI files, should I manually download them one by one? Is there a better way?
I think the best way to download is via https://portal.gdc.cancer.gov. I would query by the cancer type, get a "manifest" of all diagnostic WSIs for that cancer type, and then download slides from the manifest. Then, afterwards, one can align the downloaded WSIs with the slide_id
column.
I think the best way to download is via https://portal.gdc.cancer.gov. I would query by the cancer type, get a "manifest" of all diagnostic WSIs for that cancer type, and then download slides from the manifest. Then, afterwards, one can align the downloaded WSIs with the
slide_id
column.
Hello, after downloading the BRCA dataset from gdc, what should I do to obtain the mapping between IDC and ILC, if it is not applicable to the CSV table you provided,hanks.