Investigate UTA 20161024 anomalies
reece opened this issue · 2 comments
reece commented
Reported by Geoff Nilsen:
- Some alignments have null cigar strings
- missing splign alignment for NM_145045.4' ~ NC_000019.9 (.10 exists)
reece commented
Re 1:
Confirmed:
reece@[local]/uta_dev=> select hgnc,tx_ac,alt_ac,alt_aln_method,ord,cigar
from uta_20161024.tx_exon_aln_v
where tx_ac = 'NM_001038633.3' and alt_ac = 'NC_000001.10'
order by alt_aln_method, ord;
┌───────┬────────────────┬──────────────┬────────────────┬─────┬───────┐
│ hgnc │ tx_ac │ alt_ac │ alt_aln_method │ ord │ cigar │
├───────┼────────────────┼──────────────┼────────────────┼─────┼───────┤
│ RSPO1 │ NM_001038633.3 │ NC_000001.10 │ blat │ 0 │ 358= │
│ RSPO1 │ NM_001038633.3 │ NC_000001.10 │ blat │ 1 │ 67= │
│ RSPO1 │ NM_001038633.3 │ NC_000001.10 │ blat │ 2 │ ¤ │
│ RSPO1 │ NM_001038633.3 │ NC_000001.10 │ blat │ 3 │ ¤ │
│ RSPO1 │ NM_001038633.3 │ NC_000001.10 │ blat │ 4 │ ¤ │
│ RSPO1 │ NM_001038633.3 │ NC_000001.10 │ blat │ 5 │ ¤ │
│ RSPO1 │ NM_001038633.3 │ NC_000001.10 │ blat │ 6 │ ¤ │
│ RSPO1 │ NM_001038633.3 │ NC_000001.10 │ blat │ 7 │ ¤ │
│ RSPO1 │ NM_001038633.3 │ NC_000001.10 │ splign │ 0 │ ¤ │
│ RSPO1 │ NM_001038633.3 │ NC_000001.10 │ splign │ 1 │ ¤ │
│ RSPO1 │ NM_001038633.3 │ NC_000001.10 │ splign │ 2 │ ¤ │
│ RSPO1 │ NM_001038633.3 │ NC_000001.10 │ splign │ 3 │ 382= │
│ RSPO1 │ NM_001038633.3 │ NC_000001.10 │ splign │ 4 │ ¤ │
│ RSPO1 │ NM_001038633.3 │ NC_000001.10 │ splign │ 5 │ ¤ │
│ RSPO1 │ NM_001038633.3 │ NC_000001.10 │ splign │ 6 │ ¤ │
│ RSPO1 │ NM_001038633.3 │ NC_000001.10 │ splign │ 7 │ ¤ │
└───────┴────────────────┴──────────────┴────────────────┴─────┴───────┘
Re 2:
I don't see this issue yet, but will continue to look.
reece@[local]/uta_dev=> select distinct hgnc,tx_ac,alt_ac,alt_aln_method from uta_20161024.tx_exon_aln_v
where alt_aln_method='splign' and alt_ac~'^NC_0000' and tx_ac = 'NM_145045.4';
┌─────────┬─────────────┬──────────────┬────────────────┐
│ hgnc │ tx_ac │ alt_ac │ alt_aln_method │
├─────────┼─────────────┼──────────────┼────────────────┤
│ CCDC151 │ NM_145045.4 │ NC_000019.9 │ splign │
│ CCDC151 │ NM_145045.4 │ NC_000019.10 │ splign │
└─────────┴─────────────┴──────────────┴────────────────┘
reece commented
The primary bug was that align-exons had two optimizations that didn't play nicely together, resulting in a deterministic pattern of computing but NOT committing alignments. That was fixed.
In addition, I also backfilled nearly all missing alignments:
reece@[local]/uta_dev=> select distinct hgnc,tx_ac,alt_ac,alt_aln_method
from uta_1_1.tx_exon_aln_v
where cigar is null;
┌────────┬───────────────────────┬─────────────┬────────────────┐
│ hgnc │ tx_ac │ alt_ac │ alt_aln_method │
├────────┼───────────────────────┼─────────────┼────────────────┤
│ KCNJ16 │ NM_170741.2/465..1722 │ NC_018928.2 │ splign │
│ RIBC2 │ NM_015653.4/211..1345 │ NC_018933.2 │ splign │
│ ERN2 │ NM_033266.3/169..3094 │ NC_018927.2 │ splign │
│ KCNJ16 │ NM_018658.2/546..1803 │ NC_018928.2 │ splign │
└────────┴───────────────────────┴─────────────┴────────────────┘
These four transcripts are all cases where the CDS start and end changed (in which case UTA renames them for archival purposes).