bahlolab/exSTRa

errors when object is null

Closed this issue · 8 comments

Hi Rick,
I am using exSTRa for detecting tandem repeat expansion with WES data. I communicated with you before on GitHub regarding the error “Multiple mates found for read” (Perl module) which was resolved as per your suggestion where I used the first read found.

I just finished the second part of exSTRa software (R module). I came across a few errors at 2-3 different places. They mostly occurred due to the object being null/empty. I temporarily added if condition to check those objects and returned NULL. This made me possible to move forward and perform a significance test. However, I am not sure the actions I have taken to deal with those errors are fine and do not have any consequences on the downstream analysis.
Can you please take a look at those lines where I updated/added code to handle the errors and suggest how to resolve them?
Additionally, I would like to know how I can use this software with thousands of samples where I will be testing one sample against all other samples (instead of case-control analysis) for the significance test.

1. @ tsum_test.R After Line 269

qmt <- qm$y.mat

added following code since qmt was null

if(is.null(dim(qmt))) {
# stop("No samples left after trimming")
print("qmt object empty")
return(NULL)
}

2. @ tsum_test.R before Line 296 added following code

if(is.null(dim(qmt_bac))) {
print("No samples left after trimming")
return(NULL)
}

3. @ tsum_test.R at Line 296 modified following code (# verify qmt is not too small: )
If I want to continue processing next loci instead of stopping the execution

if(nrow(qmt_bac) < 2) {
print("Too few samples (", nrow(qmt_bac), ") left after trimming")
return(NULL)
}

Thanks,
Bharati

Hi Bharati,

sorry to only just get back to you now. How many samples are in your analysis? This might be due to just having a very small number of samples (<4)? Otherwise there may be a bug. Which version are you using?

To use the software with thousands of samples, you should use the groups.regex and groups.samples options in the read_score function, allowing you to choose cases and controls; the groups.regex may be easier to use in your setup. Set the one sample you want to test as the case, and all others as controls. Use tsum_test(..., case_control = TRUE), and hopefully this will do what you want.

Cheers,
Rick

Hi Rick,
I am using exSTRa 0.88.6 version, with 20 samples for testing the STRs catalog provided with the software.
I do not have cases and controls, so I would like to check each sample against all other remaining samples for outlier detection. For now, I am using groups.regex option in the read_score() but not sure if its the correct way for the analysis I want to do.
Thanks,
Bharati

Hi Bharati,

if you don't have any distinct cases or controls, you should use tsum_test(..., case_control = FALSE). This trims the distributions in order to estimate the mean and variance, so that we can perform outlier detection.

As for your NULL object errors, I'm unsure as to why this is happening. As I understand, you are getting results for some loci? There could be a lack of coverage in some loci, but I have successfully run this on 100,000s of repeat loci in the genome so this seems unlikely. Is your data aligned to hg19?

Are you able to successfully run through the example script? https://github.com/bahlolab/exSTRa/blob/master/examples/exSTRa_score_analysis.R

Regards,
Rick

Hi Rick,
Thanks for your prompt response. I was successfully able to run through the example script.
I am actually using data aligned to hg38. I have changed the code in Perl script exSTRa_score.pl and Perl module exSTRa.pm to work with hg38. Similarly, I also changed a few lines in R script for reading the catalog in hg38 format and I have no issues with read_score().

The qmt and qmt_bac NULL issues start with tsum_tes(). However, after fixing those error I was able to get some output. Our ultimate goal to run simple repeats (in hg38 build) with exSTRa for the outlier detection.
Can I email those files (Perl and R code) to you for your review? It will be great if exSTRa can work with hg38.

Thanks,
Bharati

Hi Bharati,

did you change the positions of the known expansion loci? It's important their new location is correct for hg38.

We are working on providing a hg38 table for this month, but I can try to make this a priority for the next week.

Cheers,
Rick

Hi Rick,
I am using hg38 coordinates for those loci plus some additional loci. Similarly, I have also created simple repeats catalog to use with exSTRa as per the guide line suggested on GitHub but for hg38 build.
Since you are already working on hg38 catalog, does that mean that exSTRa new version will be available soon to work with hg38 data? That will be super helpful for our project.
Thanks,
Bharati

Hi @bharatij,

sorry for the time taken to answer. We have a hg38 table for known repeats at https://github.com/bahlolab/exSTRa/blob/devel_tankard/inst/extdata/repeat_expansion_disorders_hg38.txt Note this still has hg19_start in columns since I foolishly hard coded these values (we are working to be able to use a more generic column name to reduce confusion.

@mfbennett has recently added more loci for hg37, and I believe he will soon also add a longer list for hg38.

Cheers,
Rick

I think this was sorted out.