FowlerLab/Enrich2

Error when starting analysis

diabatem opened this issue · 13 comments

Hello, I am attempting to use Enrich for barcode counting and every time I start the analysis I get the same error messages "Enrich2 encountered an error: 'No object named /main/barcodes/counts in the file'" and I do not know how to eliminate this error. I have also tried updating the version of the program but that did not get rid of the problem.

Make sure you're not performing the analysis in a directory that already has Enrich2 output files in it, such as from an analysis attempt that failed. The program tries to read from existing HDF5 files if present, and if those don't contain the required information (such as the /main/barcodes/counts data frame) you would see an error like this.

The output directory does not contain any output files within it and I changed the output directory completely and still received the same error message.

It sounds like there is some mismatch between your configuration and the FASTQ files you are processing. Can you post your config file and a sample of your data?

Here is the json. The fastq files are too large to upload in this comment.
ctermBRCA1 copy.zip

I see that your minimum barcode count is set to 4000, which is pretty high. It's possible that you are unintentionally discarding all of your barcodes because you don't have sufficient sequencing depth. Try setting the minimum to something much smaller and re-running to see if that fixes the problem.

I reduced the minimum barcode count to 50 and the error still persists on.

It's possible that the barcodes are being counted but that there's an issue with trimming or otherwise mapping them to your barcode map.

What is the name of the file being processed when Enrich2 throws the error? Does it end with _lib.h5?

If so, please try running the following code, either as a standalone script or in an interactive shell after substituting the correct file path:

import pandas as pd

store = pd.HDFStore("/path/to/seqlib/hdf5/file.h5")

print("Counted {} unique barcodes.".format(len(store['/raw/barcodes/counts'])))
print("")
print("Ten most abundant barcodes:")
print(store['/raw/barcodes/counts'][:10])

The program creates all the normal files .h5 and this is what I see before the error appears.
screen shot 2018-10-16 at 12 03 37 pm

When I ran the standalone script, this is the error I was given.
screen shot 2018-10-16 at 12 17 55 pm

Please run the standalone script again on the HDF5 file ending with _lib.h5 and let me know what happens.

I ran the code on the lib.h5 files and got the same error.
screen shot 2018-10-17 at 12 25 00 pm

Let me explain what is happening in this log that you posted:

The program creates all the normal files .h5 and this is what I see before the error appears.
screen shot 2018-10-16 at 12 03 37 pm

Enrich2 first creates empty HDF5 files for each part of the analysis (SeqLib, Selection, and Experiment). Then, it starts processing them one by one. It starts with the first Selection b_rep1 and processes the first time point 4-2brca1-.

We can see from the log that the program successfully counts the raw barcodes for the 4-brca1- time point, finding 2866779 total barcodes and storing them in /raw/barcodes/counts.

Then, it filters these barcodes by including only barcodes that are present in barcode-variant map file and stores these in /main/barcodes/counts. This step is unsuccessful, so the /main/barcodes/counts entry is not created, and the program crashes when trying to access it.

You have provided output from the standalone script on one of the other _lib.h5 files that has no output, because the program crashed before it was processed. Please run it again on the file that contains counts according to the log and send that output.

It is my suspicion that your barcode-variant map does not actually contain the same barcode sequences that Enrich2 is counting. This is usually caused by incorrect read trimming, so please double check that.

This is the error for the first rep. Your suspicion is correct and the barcode-variant map does not have the same barcode sequences that Enrich2 is counting.
screen shot 2018-10-18 at 11 20 19 am

I think that is the wrong screenshot, but regardless I'm glad that we were able to get to the bottom of this. I've added a new enhancement issue (#19) to improve the program's behavior in this situation.