Inquiry about Dataset

Question

Inquiry about Dataset

bryanwong17 opened this issue 2 years ago · 4 comments

Hi Richard,

Could you please let me know how to download the same datasets as yours? Additionally, I tried git cloning your repository, but I couldn't unzip all zip folders in HIPT/2-Weakly-Supervised-Subtyping/dataset_csv. What is the procedure for unzipping it? Thank you

Answer 1 · 2022-11-23T01:34:30.000Z

Hi @bryanwong17 - I perhaps saved the CSV files in an unconventional way (saved previously for storing the count matrices / RNA-seq abundances for all 20K-ish genes). You can open the dataset CSV files via pandas.

Answer 2 · 2022-11-23T01:47:37.000Z

Hi @Richarizardd,

Thanks for your fast reply. I was able to open the dataset CSV files and see all slide_ids. Moreover, I found CSV files for the 10 splits. If I want to download WSI files, should I manually download them one by one? Is there a better way?

Answer 3 · 2022-11-23T02:29:35.000Z

I think the best way to download is via https://portal.gdc.cancer.gov. I would query by the cancer type, get a "manifest" of all diagnostic WSIs for that cancer type, and then download slides from the manifest. Then, afterwards, one can align the downloaded WSIs with the slide_id column.

Answer 4 · 2024-03-19T12:37:20.000Z

I think the best way to download is via https://portal.gdc.cancer.gov. I would query by the cancer type, get a "manifest" of all diagnostic WSIs for that cancer type, and then download slides from the manifest. Then, afterwards, one can align the downloaded WSIs with the slide_id column.

Hello, after downloading the BRCA dataset from gdc, what should I do to obtain the mapping between IDC and ILC, if it is not applicable to the CSV table you provided,hanks.