bulik/ldsc

--overlap-annot does not recognize path in --annot

Opened this issue · 0 comments

We are trying to keep annotations in separate folder from ld score files. Everything works fine until you start using --overlap-annot. The flag requires to read annotation files, but it tries to do this in the same location where ld score files located.

def _read_annot(args, log):
    '''Read annot matrix.'''
    try:
        if args.ref_ld is not None:
            overlap_matrix, M_tot = _read_chr_split_files(args.ref_ld_chr, args.ref_ld, log,
                                                          'annot matrix', ps.annot, frqfile=args.frqfile)
        elif args.ref_ld_chr is not None:
            overlap_matrix, M_tot = _read_chr_split_files(args.ref_ld_chr, args.ref_ld, log,
                                                          'annot matrix', ps.annot, frqfile=args.frqfile_chr)

Why args.ref_ld_chr even if args.annot provided? It looks like as bug, not feature.

Here is the stack trace:

+ python /scripts/ldsc/ldsc.py --h2 /fsx/dl_ldsc/temp_sumstats/id_795/id_795.sumstats.gz --ref-ld-chr /fsx/dl_ldsc/ld_scores/phenotype=baseline/phenotype_id=ld_v2_2_orig_hg19_hg38/baseline_ --out /fsx/dl_ldsc/heritability_stats/phenotype=tissue/phenotype_id=topics-4-kmeans-5/sumstat_id=795/bed_list=baseline/h2_id_795 --w-ld-chr /fsx/dl_ldsc/ld_scores/phenotype=weight/phenotype_id=ukb/base_ --frqfile-chr /fsx/dl_ldsc/ldsc_refs/frqfile/ukb_ref_freq_ --print-coefficients --overlap-annot --annot /fsx/dl_ldsc/annotations/phenotype=baseline/phenotype_id=ld_v2_2_orig_hg19_hg38/baseline_

  | 2023-08-02T14:38:04.788-04:00 | *********************************************************************
  | 2023-08-02T14:38:04.788-04:00 | * LD Score Regression (LDSC)
  | 2023-08-02T14:38:04.788-04:00 | * Version 1.0.1
  | 2023-08-02T14:38:04.788-04:00 | * (C) 2014-2019 Brendan Bulik-Sullivan and Hilary Finucane
  | 2023-08-02T14:38:04.788-04:00 | * Broad Institute of MIT and Harvard / MIT Department of Mathematics
  | 2023-08-02T14:38:04.788-04:00 | * GNU General Public License v3
  | 2023-08-02T14:38:04.788-04:00 | *********************************************************************
  | 2023-08-02T14:38:04.788-04:00 | Call:
  | 2023-08-02T14:38:04.788-04:00 | ./ldsc.py \
  | 2023-08-02T14:38:04.788-04:00 | --out /fsx/dl_ldsc/heritability_stats/phenotype=tissue/phenotype_id=topics-4-kmeans-5/sumstat_id=795/bed_list=baseline/h2_id_795 \
  | 2023-08-02T14:38:04.788-04:00 | --annot /fsx/dl_ldsc/annotations/phenotype=baseline/phenotype_id=ld_v2_2_orig_hg19_hg38/baseline_ \
  | 2023-08-02T14:38:04.788-04:00 | --h2 /fsx/dl_ldsc/temp_sumstats/id_795/id_795.sumstats.gz \
  | 2023-08-02T14:38:04.788-04:00 | --ref-ld-chr /fsx/dl_ldsc/ld_scores/phenotype=baseline/phenotype_id=ld_v2_2_orig_hg19_hg38/baseline_ \
  | 2023-08-02T14:38:04.788-04:00 | --w-ld-chr /fsx/dl_ldsc/ld_scores/phenotype=weight/phenotype_id=ukb/base_ \
  | 2023-08-02T14:38:04.788-04:00 | --overlap-annot \
  | 2023-08-02T14:38:04.788-04:00 | --print-coefficients \
  | 2023-08-02T14:38:04.788-04:00 | --frqfile-chr /fsx/dl_ldsc/ldsc_refs/frqfile/ukb_ref_freq_
  | 2023-08-02T14:38:04.788-04:00 | Beginning analysis at Wed Aug 2 18:37:10 2023
  | 2023-08-02T14:38:04.788-04:00 | Reading summary statistics from /fsx/dl_ldsc/temp_sumstats/id_795/id_795.sumstats.gz ...
  | 2023-08-02T14:38:04.788-04:00 | Read summary statistics for 9169554 SNPs.
  | 2023-08-02T14:38:04.788-04:00 | Dropped 10 SNPs with duplicated rs numbers.
  | 2023-08-02T14:38:04.788-04:00 | Reading reference panel LD Score from /fsx/dl_ldsc/ld_scores/phenotype=baseline/phenotype_id=ld_v2_2_orig_hg19_hg38/baseline_[1-22] ... (ldscore_fromlist)
  | 2023-08-02T14:38:04.788-04:00 | Read reference panel LD Scores for 879870 SNPs.
  | 2023-08-02T14:38:04.788-04:00 | Removing partitioned LD Scores with zero variance.
  | 2023-08-02T14:38:04.788-04:00 | Reading regression weight LD Score from /fsx/dl_ldsc/ld_scores/phenotype=weight/phenotype_id=ukb/base_[1-22] ... (ldscore_fromlist)
  | 2023-08-02T14:38:04.788-04:00 | Read regression weight LD Scores for 879870 SNPs.
  | 2023-08-02T14:38:04.788-04:00 | After merging with reference panel LD, 849364 SNPs remain.
  | 2023-08-02T14:38:04.788-04:00 | After merging with regression SNP LD, 849364 SNPs remain.
  | 2023-08-02T14:38:04.788-04:00 | Removed 323 SNPs with chi^2 > 80 (849041 SNPs remain)
  | 2023-08-02T14:38:04.788-04:00 | Total Observed scale h2: 0.3837 (0.0388)
  | 2023-08-02T14:38:04.788-04:00 | Categories: <list of annotation files>
  | 2023-08-02T14:38:04.788-04:00 | Lambda GC: 1.2729
  | 2023-08-02T14:38:04.788-04:00 | Mean Chi^2: 1.4287
  | 2023-08-02T14:38:04.788-04:00 | Intercept: 1.1011 (0.0172)
  | 2023-08-02T14:38:04.788-04:00 | Ratio: 0.2357 (0.0402)
  | 2023-08-02T14:38:04.788-04:00 | Reading annot matrix from /fsx/dl_ldsc/ld_scores/phenotype=baseline/phenotype_id=ld_v2_2_orig_hg19_hg38/baseline_[1-22] ... (annot)
  | 2023-08-02T14:38:04.788-04:00 | Error parsing .annot file.
  | 2023-08-02T14:38:04.788-04:00 | Analysis finished at Wed Aug 2 18:38:04 2023
  | 2023-08-02T14:38:04.788-04:00 | Total time elapsed: 54.51s
  | 2023-08-02T14:38:04.789-04:00 | Traceback (most recent call last):
  | 2023-08-02T14:38:04.789-04:00 | File "/scripts/ldsc/ldsc.py", line 651, in <module>
  | 2023-08-02T14:38:04.789-04:00 | sumstats.estimate_h2(args, log)
  | 2023-08-02T14:38:04.789-04:00 | File "/scripts/ldsc/ldscore/sumstats.py", line 372, in estimate_h2
  | 2023-08-02T14:38:04.790-04:00 | overlap_matrix, M_tot = _read_annot(args, log)
  | 2023-08-02T14:38:04.790-04:00 | File "/scripts/ldsc/ldscore/sumstats.py", line 96, in _read_annot
  | 2023-08-02T14:38:04.790-04:00 | overlap_matrix, M_tot = _read_chr_split_files(args.ref_ld_chr, args.ref_ld, log,
  | 2023-08-02T14:38:04.790-04:00 | File "/scripts/ldsc/ldscore/sumstats.py", line 153, in _read_chr_split_files
  | 2023-08-02T14:38:04.791-04:00 | out = parsefunc(_splitp(chr_arg), _N_CHR, **kwargs)
  | 2023-08-02T14:38:04.791-04:00 | File "/scripts/ldsc/ldscore/parse.py", line 196, in annot
  | 2023-08-02T14:38:04.791-04:00 | annot_s, annot_comp_single = which_compression(first_fh)
  | 2023-08-02T14:38:04.791-04:00 | File "/scripts/ldsc/ldscore/parse.py", line 53, in which_compression
  | 2023-08-02T14:38:04.791-04:00 | raise IOError('Could not open {F}[./gz/bz2]'.format(F=fh))
  | 2023-08-02T14:38:04.792-04:00 | OSError: Could not open /fsx/dl_ldsc/ld_scores/phenotype=baseline/phenotype_id=ld_v2_2_orig_hg19_hg38/baseline_1.annot[./gz/bz2]
  | 2023-08-02T14:38:04.792-04:00 | During handling of the above exception, another exception occurred:
  | 2023-08-02T14:38:04.792-04:00 | Traceback (most recent call last):
  | 2023-08-02T14:38:04.792-04:00 | File "/scripts/ldsc/ldsc.py", line 662, in <module>
  | 2023-08-02T14:38:04.792-04:00 | log.log( traceback.format_exc(ex) )
  | 2023-08-02T14:38:04.793-04:00 | File "/usr/local/lib/python3.10/traceback.py", line 183, in format_exc
  | 2023-08-02T14:38:04.793-04:00 | return "".join(format_exception(*sys.exc_info(), limit=limit, chain=chain))
  | 2023-08-02T14:38:04.793-04:00 | File "/usr/local/lib/python3.10/traceback.py", line 135, in format_exception
  | 2023-08-02T14:38:04.793-04:00 | te = TracebackException(type(value), value, tb, limit=limit, compact=True)
  | 2023-08-02T14:38:04.793-04:00 | File "/usr/local/lib/python3.10/traceback.py", line 502, in __init__
  | 2023-08-02T14:38:04.794-04:00 | self.stack = StackSummary.extract(
  | 2023-08-02T14:38:04.794-04:00 | File "/usr/local/lib/python3.10/traceback.py", line 357, in extract
  | 2023-08-02T14:38:04.794-04:00 | if limit >= 0:
  | 2023-08-02T14:38:04.795-04:00 | TypeError: '>=' not supported between instances of 'OSError' and 'int'