Explanation of txClassDescriptions and error message for large bam file.
apsteinberg opened this issue · 24 comments
Hi there,
Thank you again for helping me the other day with the issues I was having. I've been able to run bambu successfully on some of my samples, and I have a couple of follow-up questions (I hope you don't mind me asking here, happy to correspond via email if this is easier):
-
I would like to now tap into bambu's really awesome isoform analysis tools. I saw that you have in the output a "txClassDescription", which describes what is novel about the isoform. I wanted to do an analysis of full length isoforms for my samples as your team has done in Figure 5a, d, and e of the following paper: https://www.biorxiv.org/content/10.1101/2021.04.21.440736v1
And I was wondering how the txClassDescription codes relate to some of the events you've described here, if at all? For example, does a txClassDescription code of "new First Junction" correspond to what you describe as a "alternative 5' end" in the paper? And further, are exon skipping events encompassed in these class codes? If not, it would be wonderful if you could point me to where I may be able to classify my transcripts in this way. -
I am encountering an issue with one of my bam files through bambu, and I am not entirely sure why. I am getting the following error:
--- Start generating read class files ---
Error: BiocParallel errors
1 remote errors, element index: 1
0 unevaluated and other errors
first remote error:
Error in h(simpleError(msg, call)): error in evaluating the argument 'y' in selecting a method for function 'intersect': error in eval
uating the argument 'x' in selecting a method for function 'ranges': Subsetting operation on CompressedGRangesList object 'x' produces
a
result that is too big to be represented as a CompressedList object.
Please try to coerce 'x' to a SimpleList object first (with 'as(x,
"SimpleList")').
In addition: There were 50 or more warnings (use warnings() to see the first 50)
Execution halted
For #2, any clues as to how to resolve this?
Thanks in advance for your time and help!
Cheers,
Asher
Hi Asher,
I am having the same issue with a subset of my files.
I had a feeling it had something to do with the size of the bam files.
Can i ask if the bam files your inputting are large (>100GB)?
Hi there,
Gotcha, yeah mine isn't that large (it's 55 GB) -- so perhaps it is related to something else? The reason I thought it had something to do with the size though was the part of the error message that read:
CompressedGRangesList object 'x' produces
a
result that is too big to be represented as a CompressedList object.
I also see on the README that it says version 3.25 resolved "issues with large files" -- I'm wondering if maybe this might resolve the issue. I'm trying to update now to this version (Bioconducter version I believe is 3.24).
Please let me know if you figure it out on your end, and I'll let you know if I make progress as well.
Cheers,
Asher
Hi both,
Lets start with 2. first. So that I can understand the issue let me confirm some details, this issue only happens for some of your bam files, and others run successfully? Is there anything different between the samples that work and the ones that do not? Are they all aligned to the same genome?
Could you comment with the code you used to run bambu, set verbose = TRUE
, and after it fails running use warnings()
and provide that here too.
Regarding the first question, I assume the link you posted is the SG-nex preprint (unfortuantely BioRxiv is down at the moment so I couldn't check).
You can find the description of the different txClassDescriptions here (https://github.com/GoekeLab/bambu#output-description). new First Junction does not necessarily refer an alternative 5' end depending on how you define that, it could share a transcription start site with other transcripts but it does mean a new first exon as the first junction is at a novel position.
We do not output descriptions of alternative splicing events as they are relative to what you are comparing them too. We do have a function compareTranscripts(granges1, granges2)
which is unfortunately not yet documented, but you could play around with it and see if the output is useful for you.
It takes 2 granges lists which must be the same length. Each annotation in the first list is compared to the corresponding annotation in the second list and returns a table indicating on splicing differences (ie, first entry in list one is compared to the first entry in list two, then second entries are compared against each other and so on). For example you could set the first list being all the novel transcripts and the second being the major isoform (most expressed) from the corresponding gene. Unfortunately because we have not documented it and do not support this function yet I won't be able to support you much further with the use of this function however, nevertheless I hope it will be of use.
Kind Regards,
Andre Sim
Hi Andre,
Thanks for help on this issue,
Yes it appears some BAM files generate this error while others do not. The BAMs were all generated using the same minimap command and mapped to the same reference file. There is nothing 'special' about these files as far as i can tell.
I ran the following command
bambuAnnotations <- bambu::prepareAnnotations(annotation)
bambu_out <- withr::with_package("GenomeInfoDb",
bambu::bambu(
reads = c("C3_Day80_matched_reads_dedup_align2genome.bam"),
annotations = bambuAnnotations,
genome = "/data/scratch/users/yairp/old_discoAnt/discoAnt/ref_hg38/GRCh38.primary_assembly.genome.fa",
quant = FALSE,
discovery = TRUE,
lowMemory = TRUE,
verbose = TRUE,
ncore = 4)
)
bambu::writeBambuOutput(bambu_out, path = "/out/all_together")
in verbose mode bambu generates alot of messages but here are the final few lines
reads count for all annotated junctions: 0 (0%) reads count for all annotated junctions after correction to reference junction: 0 (0%) Finished correcting junction based on set of high confidence junctions in 0 mins. Finished creating transcript models (read classes) for reads with spliced junctions in 0 mins. Finished creating junction list with splice motif in 0 mins. before strand correction, annotated introns: NA (NA%) Junction correction with not enough data, precalculated model is used Model to predict true splice sites built in 0 mins. reads count for all annotated junctions: 0 (0%) reads count for all annotated junctions after correction to reference junction: 0 (0%) Finished correcting junction based on set of high confidence junctions in 0 mins. Finished creating transcript models (read classes) for reads with spliced junctions in 0 mins. Error: BiocParallel errors 1 remote errors, element index: 1 0 unevaluated and other errors first remote error: Error in h(simpleError(msg, call)): error in evaluating the argument 'y' in selecting a method for function 'intersect': error in evaluating the argument 'x' in selecting a method for function 'ranges': Subsetting operation on CompressedGRangesList object 'x' produces a result that is too big to be represented as a CompressedList object. Please try to coerce 'x' to a SimpleList object first (with 'as(x, "SimpleList")'). In addition: There were 50 or more warnings (use warnings() to see the first 50) Execution halted
The version i am running is 3.2.6
hope this helps get to the bottom of this
Thanks for your help
Sefi
Hi Sefi,
Thanks for sharing this. This output is concerning as it says there are no reads for any of the annotated junctions which should not occur normally for a human sample. How many reads does this bam file have? Do you expect there to only be unspliced reads? Do you still have the minimap2 command you used and could you share it? A common error is that minimap was not run with splice aware mode on.
Another cause for this could be the annotations do not line up with the genome, for example if the genome and bam have scaffolds named "chr1" and the annotations have "1". Could you share with me head(bambuAnnotations), there have been cases where certain gff3 formats have issues?
Let me know these first and if these seem normal we can delve deeper.
Kind Regards,
Andre Sim
Hi Andre,
Yes that does seem very strange. I expect spliced and unspliced reads.
This bam file contains ~92M primary alignments
I have successful runs with some bam files using the same gtf and genome files so i doubt it has to do with the reference files i am using.
My minimap command:
minimap2 -ax splice -t 16 -k14 --secondary=no --seed 2023 --junc-bed out/D80/tmp_splice_anno.bed12 --junc-bonus 1 -o out/D80/C3_Day80_tmp_align.sam ref_hg38/GRCh38.primary_assembly.genome.fa out/D80/C3_Day80_matched_reads.fastq
head(bambuAnnotations)
GRangesList object of length 6:
$ENST00000000233.10
GRanges object with 6 ranges and 2 metadata columns:
seqnames ranges strand | exon_rank exon_endRank
|
[1] chr7 127588411-127588565 + | 1 6
[2] chr7 127589083-127589163 + | 2 5
[3] chr7 127589485-127589594 + | 3 4
[4] chr7 127590066-127590137 + | 4 3
[5] chr7 127590963-127591088 + | 5 2
[6] chr7 127591213-127591700 + | 6 1
seqinfo: 25 sequences from an unspecified genome; no seqlengths
$ENST00000000412.8
GRanges object with 7 ranges and 2 metadata columns:
seqnames ranges strand | exon_rank exon_endRank
|
[1] chr12 8940361-8941940 - | 7 1
[2] chr12 8942416-8942542 - | 6 2
[3] chr12 8943405-8943535 - | 5 3
[4] chr12 8943801-8943910 - | 4 4
[5] chr12 8945418-8945584 - | 3 5
[6] chr12 8946229-8946405 - | 2 6
[7] chr12 8949488-8949645 - | 1 7
seqinfo: 25 sequences from an unspecified genome; no seqlengths
$ENST00000000442.11
GRanges object with 7 ranges and 2 metadata columns:
seqnames ranges strand | exon_rank exon_endRank
|
[1] chr11 64305524-64305736 + | 1 7
[2] chr11 64307168-64307504 + | 2 6
[3] chr11 64313951-64314067 + | 3 5
[4] chr11 64314239-64314367 + | 4 4
[5] chr11 64314741-64314911 + | 5 3
[6] chr11 64315001-64315270 + | 6 2
[7] chr11 64315707-64316743 + | 7 1
seqinfo: 25 sequences from an unspecified genome; no seqlengths
Hope this helps narrow down the issue
Hi Sefi,
This all looks good, this definitely is narrowing down where the issue may be occurring. Do you happen to remember how many reads were listed for the samples that did work here "reads count for all annotated junctions" in the verbose output? Assuming the samples are of similar size I would expect around ~40M depending on how degraded the sample is.
I doubt this is the cause, as you said you used the same reference files for the samples that worked, but just one last sanity check. What is the output of (in bash) head ref_hg38/GRCh38.primary_assembly.genome.fa
. Also have you tried looking at this bam file in a genome browser like IGV to confirm there are reads matching an annotated junction?
Assuming the above look normal, then the issue must lie in the early parts of the code where bambu is reading in the bam file. Are you able to subsample this bam file (perhaps only to only a region of the chromosome for example the area you looked at in the genome browser) and if the issue still occurs attach that sampled bam file here along with the annotations and genome so I investigate how bambu is handling these reads? With a minimal reproducible input set it will make it a lot easier for me to try solve this on myside :) .
Kind Regards,
Andre Sim
Hi Andre,
Yes i agree an example would be useful.
I have subset the bam based on a single gene i know is in my data. I have viewed this in IGV and it looks clear to me that there are reads that map to known junctions of this gene. The mapping is a more messy than i expected but i think this should still be fine for bambu isoform discovery. here is what i can see.
Running this subset generates this error
--- Start generating read class files --- GRIA2 Finished creating junction list with splice motif in 0 mins. before strand correction, annotated introns: 35 (0.00982042648709315%) [17:29:25] WARNING: ../..//src/learner.cc:553: If you are loading a serialized model (like pickle in Python, RDS in R) generated by older XGBoost, please export the model by calling
Booster.save_model` from that version
first, then load it back in current version. See:
https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html
for more details about differences between saving model and serializing.
[17:29:25] WARNING: ../..//src/learner.cc:553:
If you are loading a serialized model (like pickle in Python, RDS in R) generated by
older XGBoost, please export the model by calling Booster.save_model
from that version
first, then load it back in current version. See:
https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html
for more details about differences between saving model and serializing.
[17:29:25] WARNING: ../..//src/learner.cc:553:
If you are loading a serialized model (like pickle in Python, RDS in R) generated by
older XGBoost, please export the model by calling Booster.save_model
from that version
first, then load it back in current version. See:
https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html
for more details about differences between saving model and serializing.
[17:29:25] WARNING: ../..//src/learner.cc:553:
If you are loading a serialized model (like pickle in Python, RDS in R) generated by
older XGBoost, please export the model by calling Booster.save_model
from that version
first, then load it back in current version. See:
https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html
for more details about differences between saving model and serializing.
Junction correction with not enough data, precalculated model is used
Model to predict true splice sites built in 0 mins.
reads count for all annotated junctions: 84917 (0.941200594090133%)
reads count for all annotated junctions after correction to reference junction: 84919 (0.941222761632418%)
Finished correcting junction based on set of high confidence junctions in 0 mins.
Finished creating transcript models (read classes) for reads with spliced junctions in 0 mins.
Finished create single exon transcript models (read classes) in 0 mins.
[17:29:37] WARNING: ../..//src/learner.cc:553:
If you are loading a serialized model (like pickle in Python, RDS in R) generated by
older XGBoost, please export the model by calling Booster.save_model
from that version
first, then load it back in current version. See:
https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html
for more details about differences between saving model and serializing.
[17:29:37] WARNING: ../..//src/learner.cc:553:
If you are loading a serialized model (like pickle in Python, RDS in R) generated by
older XGBoost, please export the model by calling Booster.save_model
from that version
first, then load it back in current version. See:
https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html
for more details about differences between saving model and serializing.
Sample does not have more than 50 of both read class labels
Transcript model not trained. Using pre-trained models
Finished generating scores for read classes in 0.2 mins.
--- per sample warnings during read class construction ---
Warnings for: GRIA2
not all chromosomes present in reference annotations, annotations might be incomplete. Please compare objects on the same reference
Bambu was unable to train a model on this sample, and is using a pretrained model
--- Start extending annotations ---
combing spliced feature tibble objects across all samples in 0 mins.
extract new unspliced ranges object for all samples in 0 mins.
reduce new unspliced ranges object across all samples in 0 mins.
combine new unspliced tibble object across all samples in 0 mins.
combining transcripts in 0 mins.
extended annotations for spliced reads in 0 mins.
extended annotations for unspliced reads in 0 mins.
WARNING - Less than 50 TRUE or FALSE read classes for NDR precision stabilization.
NDR will be approximated as: (1 - Transcript Model Prediction Score)
-- Predicting annotation completeness to determine NDR threshold --
Recommended NDR for baseline FDR of 0.1 = 1
A high NDR threshold is being recommended by Bambu indicating high levels of novel transcripts, limiting the performance of the trained model
We recommend training a new model on similiar but well annotated dataset if available (https://github.com/GoekeLab/bambu/tree/master#Training-a-model-on-another-speciesdataset-and-applying-it), or alternatively running Bambu with opt.discovery=list(fitReadClassModel=FALSE)
Using a novel discovery rate (NDR) of: 1
transcript filtering in 0.5 mins.
extend annotations in 0.5 mins.
Warning messages:
1: In bambu.processReadsByFile(bam.file = reads[bamFileName], genomeSequence = genomeSequence, :
not all chromosomes present in reference annotations, annotations might be incomplete. Please compare objects on the same reference
2: There was 1 warning in summarise()
.
ℹ In argument: intersectWidth = max(intersectWidth)
.
Caused by warning in max()
:
! no non-missing arguments to max; returning -Inf
3: Returning more (or less) than 1 row per summarise()
group was deprecated in dplyr 1.1.0.
ℹ Please use reframe()
instead.
ℹ When switching from summarise()
to reframe()
, remember that reframe()
always returns an ungrouped data frame and adjust accordingly.
ℹ The deprecated feature was likely used in the bambu package.
Please report the issue to the authors.
4: There was 1 warning in summarise()
.
ℹ In argument: intersectWidth = max(intersectWidth)
.
Caused by warning in max()
:
! no non-missing arguments to max; returning -Inf
5: In scoreReadClasses(se, genomeSequence, annotations, defaultModels = defaultModels, :
Bambu was unable to train a model on this sample, and is using a pretrained model
6: There was 1 warning in filter()
.
ℹ In argument: newGeneId == min(newGeneId)
.
Caused by warning in min()
:
! no non-missing arguments to min; returning Inf
7: In predict.lm(lm(NDRscores ~ poly(score, 3, raw = TRUE)), newdata = data.frame(score = baseline)) :
prediction from rank-deficient fit; attr(*, "non-estim") has doubtful cases
Error in MatrixGenerics:::.load_next_suggested_package_to_search(x) :
Failed to find a rowRanges() method for CompressedGRangesList objects.
Calls: ... eval -> eval -> rowRanges -> rowRanges ->
Execution halted`
the bam file is a little big to send here. Is there another way i can send it to you or should I subset further
thanks
Hi,
This seems to be a different error than the first case bambu seems to have worked in this case and finished transcript discovery. Try print(bambu_out)
to confirm this. As you don't have quant = TRUE, you need to use writeToGTF(bambu_out)
to output this as a gtf file as it only is the extended annotations. writeBambuOutput()
is used when the bam has also been quantified. This is what I believed caused this new error.
In this case it now says "reads count for all annotated junctions: 84917 (0.941200594090133%)" instead of 0 like it said previously. Was this subset from the same bam file used previously that caused the error? If so then it seems to be some alignments might be causing the issue, or maybe the bam file is formatted in a way bambu doesn't expect. I notice you named the file .dedup, is it possible the deduplication step might have changed something? You could try splitting the bam file by chromosomes to divide and conquer and find the problematic alignments that replicate the original error message. My suspicion is it might be an accessory scaffold that is causing the issue.
Let me know if you are able to replicate it and then we can organize a way to transfer the bam file so I can have a closer look.
Kind Regards,
Andre Sim
Hi Andre,
It looks like i have been able to get bambu to run on these problematic bams.
As you suggested i filtered the bams to retain only alignments mapped to chr1-22 chrX chY and chrM. I also removed supplementary alignments.
Im glad i got it working
thanks for your help
Hi,
I am glad to hear that worked. I will have to think about why Bambu would crash in the presence of other alignments, because theoretically it shouldn't happen, hopefully I can find that and solve it, or at least return an informative warning message. Thanks for providing all that information!
@apsteinberg were you able to resolve your issue as well?
Hi Andre,
I haven't unfortunately been able to resolve the issue on my end yet, but I also haven't had adequate time to try out some of the things you had suggested. I was however able to run bambu in verbose mode as you had suggested, and received a similar error to @Sefi196. Bambu was also telling me that I had no read counts. I did check my bam file and there are in fact reads there, and the header is normal. I aligned the reads using minimap within the nanoseq pipeline, and I need to check if there were any differences in the log messages from nanoseq for this particular bam vs some of the others.
Also, thank you for the details about the transcript descriptions! This is very helpful.
I am intending on working on this more later this week, but if you'd like to close out the issue for now and I could re-open it once I've had a chance to hone in on what the issues in my bam file are, please do.
Thank you for your help!
Best,
Asher
Hi Asher,
Thanks for getting back. I will keep this issue open for now, as I want to either solve the issue in the case its spurious alignments causing bambu to crash, or at least provide a graceful crash/informative error message.
If you make headway and removing non-standard chromosomes solves the issue, I would appreciate it if you could let me know how many alignments matched these spurious chromosomes.
Kind Regards,
Andre Sim
Hi Andre,
Sounds good! I will keep you posted as I progress on this. Thanks again for all your help.
Cheers,
Asher
Hi @andredsim
I've encountered this issue in the past few days as well. Upon trying to debug it, I found that the problem lies within:
generanges[subjectHits(ov)[multiHits]]
Error in METHOD(x, i) :
Subsetting operation on CompressedGRangesList object 'x' produces a result that is too big to be represented as a CompressedList object. Please try to coerce 'x' to a SimpleList object first (with 'as(x, "SimpleList")').
It seems this is due to the excessively large total number of 'grange' elements produced here (after unlisting).
a = as(generanges,"SimpleList")[subjectHits(ov)[multiHits]]
sum(lengths(a))
4212479769
I suspect this number is too large for it to be stored as a compressed range list object.
I have resolved this issue using this code:
split_intersection = function(grl,geneRanges,ov){
multiHits <- which(queryHits(ov) %in% which(countQueryHits(ov)>1))
dfs = parallel::mclapply(split(multiHits,cut(1:length(multiHits),100)),function(multiHit){
rangeIntersect= GenomicRanges::intersect(grl[queryHits(ov)[multiHit]],
geneRanges[subjectHits(ov)[multiHit]])
data.frame(queryHits = queryHits(ov)[multiHit],
intersectWidth = sum(width(rangeIntersect)),
subjectHits = subjectHits(ov)[multiHit]) %>%
group_by(queryHits) %>% summarise(subjectHits = subjectHits[which.max(intersectWidth)],
intersectWidth = max(intersectWidth))
},mc.cores = 20)
dfs = do.call("rbind",dfs)
}
filteredMultiHits = split_intersection(grl,geneRanges,ov)
This problem also arises in another section, during quantification:
calculateDistToAnnotation -> findspliceoverlapbyDist -> subject[subjectHits(olap)]
I've used a similar approach to split 'olap' and merge it, and now the code runs smoothly.
I've also attempted to convert it into a SimpleList, but the speed becomes very slow. Hope this information can help :-)
Hi @andredsim,
Apologies for the delay on my end. I finally got around to trying what @Sefi196 tried successfully here:
Hi Andre, It looks like i have been able to get bambu to run on these problematic bams. As you suggested i filtered the bams to retain only alignments mapped to chr1-22 chrX chY and chrM. I also removed supplementary alignments.
Im glad i got it working
thanks for your help
And it worked for me as well, which is very exciting! It looks like to answer your question about how many reads were aligned to scaffolds and accessory chromosomes it was 29e6 reads, or about ~30% of the aligned reads in my bam. However, I still have a few questions about this. It is still unclear to me as to why these reads would cause Bambu to fail. The genomes were aligned to a fasta file that included the scaffolds and then Bambu was given this reference fasta and gtf. Is the implication that these reads were poorly aligned? It would be good for me to understand both to control for this source of error in the future and to offer an explanation to collaborators. I can see if it's possible to transfer a partial bam if it's useful. Thanks again for your time and help!
Cheers,
Asher
Hi Asher, I am glad you were able to get it to run.
As to why it causes Bambu to fail, I cannot say for certain as as long as those scaffolds were in the fasta during alignment it shouldn't cause a problem, at least as we intend it. From @HongYhong's amazing work, it could be that after removing 30% of the reads you moved below the threshold to avoid the GRangesList bug he mentioned, however I have tested Bambu with way more reads than your particular file has so it is hard to say. (By the way Hong Yhong we are looking at implementing this fix, just needing to find the time to test it!). If you could send me a partial bam containing reads mapped to these scaffolds that causes this issue, or host the full bam somehwere so I can try replicate your bug I will gladly try and look into it and try find an explanation for you.
Kind Regards,
Andre Sim
Hi Andre,
Thanks! Got it, this makes sense. Thank you for offering to troubleshoot the bam file further. I spoke with my manager, and I think unfortunately it would be difficult for us to share these data at this stage due to some stringent confidentiality policies at our institution (we work at a hospital). However, maybe what I can do is try to split the bam file into bams for each accessory scaffold individually, and try to determine which is yielding the error. I will send further updates if I am able to pinpoint what the source of the issue is (in case it's useful for further development of Bambu). I will also see if I can potentially try to implement @HongYhong's fix. I suspect perhaps there is something else going on as you mentioned though, since originally along with the CompressedGRangesList error I was also getting the same error as @Sefi196: reads count for all annotated junctions: 0 (0%) ...
Thank you again for all your help.
Cheers,
Asher
Hi All,
Im back with same error Unfortunately. The input bam file is from a different sample and i have tried removing secondary and supplementary alignments. Mapping was performed to the hg38.analysisSet.fa so no unnecessary scaffolds were not included.
Error: BiocParallel errors
1 remote errors, element index: 1
0 unevaluated and other errors
first remote error:
Error in h(simpleError(msg, call)): error in evaluating the argument 'y' in selecting a method for function 'intersect': error in evaluating the argument 'x' in selectin
g a method for function 'ranges': Subsetting operation on CompressedGRangesList object 'x' produces a
result that is too big to be represented as a CompressedList object.
Please try to coerce 'x' to a SimpleList object first (with 'as(x,
"SimpleList")').
Is there any update on how to fix this issue?
Thanks Bambu team
Hi,
Sorry for the delay in replying to this one.
Would you please be able to try this branch. This has the changes to the code suggested above.
https://github.com/GoekeLab/bambu/tree/HongYhong_fix
I also wanted to know if on these new files you were still getting the following in your output "reads count for all annotated junctions: 0 (0%)
"
If so could you please post the full error output, if you can share the bam with me that would be really helpful for me to try debug it on my end.
Please let me know how this goes!
Kind Regards,
Andre Sim
Hi Andre,
Thanks for the suggestion re HongYhong's Fix.
This seems to have worked!
I am not getting the "reads count for all annotated junctions: 0 (0%)" message probably because i am only mapping to the correct chromosomes this time
I am happy to share the bam but it is >100GB. If you interested in taking a look at the bam let me know how i could send this over to you.
Thanks again for all the help
Sefi
Hi Sefi,
Glad to hear this works. I really need to find the time to update this branch further and get it through a pull request then.
@apsteinberg I am not sure if you tried this yet or not, but if you haven't please give it a go.
Yes I would really like to give your bam file a go. Could you write to me at andre_sim[at]gis.a-star.edu.sg and I can reply with some details on where you can upload the large file.
Kind Regards,
Andre Sim