################# tcga_query documentation author: Nolan Kuza (nkuza8@gmail.com) date: 1-17-19 ################# Parse vcf files by deleting metadata and keeping only selected columns. usage: tcga_query.py [-h] -i INPUT -t {mRNA,miRNA,both} [-q] [-m MANIFEST] [-d METADATA] required arguments: -i INPUT, --input INPUT The name of the input file containing a list of TCGA ids. Each id can be quoted or unquoted. Only the first 12 characters of each id will be used. -t {mRNA,miRNA,both}, --type {mRNA,miRNA,both} Type of data to export. Option 'both' specifies both mRNA AND miRNA. optional arguments: -h, --help show the help message and exit -q, --query If this option is included, a TCGA QUERY will be outputed to stdout and a manifest will not be downloaded. -m MANIFEST, --manifest MANIFEST The name of the file to download the manifest to (will overwrite file if it already exists). If this argument is not specified (and neither is -q), manifest will be outputed to stdout. -d METADATA, --metadata METADATA The name of the file to download the JSON metadata to (will overwrite file if it already exists). If this argument is not specified (and neither is -q), metadata will be outputed to stdout. example: We want to query mRNA with TCGA id in input.txt and then download the manifest to mani.txt and the metadata to meta.json. input.txt: TCGA-05-4250 "TCGA-05-4405" TCGA-05-4415-0192 "TCGA-05-4417-0164" Then run with: python tcga_query.py -i input.txt -t miRNA -m mani.txt -d meta.json The program will write the following into mani.txt: id filename md5 size state 10b6df33-8f32-4ed9-b1b1-8867be485a2f 6e98ba8f-e00d-470c-832f-adc3dc6956a6.FPKM.txt.gz 61e93edfed714da7280bba70c08d0f41 505452 released a07c86e4-b157-49f7-b541-7e1c0b3cf27b 4ed68c20-e7f2-4b40-97f1-4f8a3ddae0a9.FPKM.txt.gz 895c980876df6ecde9fdb681edec209e 514950 released 46ea8adf-e1bd-4e3b-8cea-307ca36073d0 6e98ba8f-e00d-470c-832f-adc3dc6956a6.FPKM-UQ.txt.gz 7ca897843589d989fcd429af53b3cad0 505835 released 3653dfea-049c-4ae8-8aca-cf6799797dd7 dc76cdde-f77f-4604-94c4-0b150b9a56b4.htseq.counts.gz 71bcd107585e69b96e909076cc9c1b61 249745 released 156acd96-fb20-4083-ace0-db465133670c d011c9fc-3598-4f0b-b059-85d63de31a9f.FPKM.txt.gz 5970f4595f6d210bfd1f9b65b045a972 501941 released 0f5f287b-ee07-47b1-9d71-03cacffbdc21 dc76cdde-f77f-4604-94c4-0b150b9a56b4.FPKM-UQ.txt.gz 03d16f7df1a9a79f7b764bb2e5ee0005 512590 released 545b3be7-9189-4e1e-85d6-af4a7540ec97 4ed68c20-e7f2-4b40-97f1-4f8a3ddae0a9.htseq.counts.gz 2098e89ef4ad8b42d27bd05b38d933b4 253840 released 3471e070-167d-4af2-b9c6-38fbd4e86add 4ed68c20-e7f2-4b40-97f1-4f8a3ddae0a9.FPKM-UQ.txt.gz cac01aaa0b0c89bfcb765b9cf606f26c 516514 released 9c740043-d02a-412a-851b-cfcb625e04c4 6e98ba8f-e00d-470c-832f-adc3dc6956a6.htseq.counts.gz 74b0373a0ccc28097775e4557fedfb00 249824 released 1fae764e-3807-4565-a78d-5ea852d327a7 d011c9fc-3598-4f0b-b059-85d63de31a9f.htseq.counts.gz 7207f6f35a1b2044035b700eb4e68aa9 248552 released 999f701a-2bd2-4e9d-94af-08d0f1be769c dc76cdde-f77f-4604-94c4-0b150b9a56b4.FPKM.txt.gz 80e7f3092027e21e879633c74596aa37 509899 released 96332d27-a064-4ca5-bf5d-f9c357556c39 d011c9fc-3598-4f0b-b059-85d63de31a9f.FPKM-UQ.txt.gz 9aadc0fd9352dccf65e509ecf186c8a7 503624 released Metadata will be outputed to meta.json
Nolan1324/TCGA-Query-Generator-Pipeline
A python command line version of my TCGAQueryGenerator C# project. Useful for pipelining and/or running on other Operating Systems.
Python