Series of scripts for aggregating exome and genome reports from CCM's CRE pipeline.
The below scripts should be run in the directory you'd like the output saved in, eg. python3 get_all_report_paths.py
and python3 copy_reports.py --report_paths=./all_reports-2022-04-21/all-report-paths-2022-04-21.csv
get_all_report_paths.py
- traverses multiple known directories with exome and exome-like reports and dumps the information into two flat files
- known directories:
current_exome
:/hpf/largeprojects/ccm_dccforge/dccforge/results
old_exome
:/hpf/largeprojects/ccmbio/naumenko/project_cheo/DCC_Samples_part1
current_genome
:/hpf/largeprojects/ccmbio/ccmmarvin_shared/genomes
old_genome
:/hpf/largeprojects/ccm_dccforge/dccdipg/c4r_wgs/results
in_progress_exome
:/hpf/largeprojects/ccmbio/ccmmarvin_shared/exomes/in_progress
- outputs:
./all_reports-yyyy-mm-dd/all-fam-ptp-reports-yyyy-mm-dd.csv
- parsed family and participant codenames and the report they belong to./all_reports-yyyy-mm-dd/all-report-paths-yyyy-mm-dd.csv
- report paths- sanity check report counts by type:
df[['report', 'report_type']].dropna().value_counts('report_type')
- sanity check report counts by type:
copy_reports.py
- takes output of above script, and cps all reports into a single, nested directory