Questions about input data

Question

Opened this issue 5 years ago · 1 comments

Hi Martyna,

I am currently doing research on miRNA mutations in lung cancer, and I just found your scripts super useful, but I have a few questions:

How do I get 'Coordinates' and 'Cancer exons' files as the input data? Do they come from TCGA? If so, could you provide an example TCGA link to the file you used?
In the final analysis step (run_mirnaome_analysis.py), what should the input data directory contain? (For example, the dir in your example is '~/dane/HNC/DATA_HNC') If you use .vcf files, could you tell me why not use the aggregated .maf files on TCGA?

In addition, if you can briefly make a list of the various data you used in your lung cancer project, I would be very grateful.

Thanks in advance!!

Answer 1 · 2021-04-06T20:32:34.000Z

@pkglimmer I'm so sorry, I totally missed your issue :(. I hope you found the scripts useful anyway.

The coordinates and cancer exons were prepared in our lab and were not available in TCGA. Those are very project specific.
We didn't use maf files as those are highly filtered mutations and we wanted to start with more raw data and do the appropriate for miRNA genes filtering on our end. You are correct, we used "vcf.gz" files

For the analysis we used data as described in the manuscript mentioned in the README file.
We also analysed similarily data for pancancer https://www.thelancet.com/journals/ebiom/article/PIIS2352-3964(20)30427-8/fulltext with slightly updated script https://github.com/martynaut/pancancer_mirnome

Sorry again.