Calculating Fitnesses from TSV files

Question

Calculating Fitnesses from TSV files

Closed this issue 5 years ago · 1 comments

I have successfully run Enrich2 on some of my sequencing data, but I want to add a few filters/normalizations to the counting step. I was hoping to provide Enrich2 with a set of TSV files for input counts and use those to calculate fitnesses.

As a base case, I tried using the TSV files that are output when I let Enrich2 count from the fastq reads (just to check syntax, etc.). It seems to recognize the data, but I get the following error message.

2019-02-20 19:51:12,661 [Test] Creating new HDF5 data store "~/[directory]/Test_exp.h5"
2019-02-20 19:51:12,663 [Test] No existing calculated values in file
2019-02-20 19:51:12,663 [SL1SL2_1] Creating new HDF5 data store "~/[directory]/SL1SL2_1_sel.h5"
2019-02-20 19:51:12,664 [SL1SL2_1] No existing calculated values in file
2019-02-20 19:51:12,665 [r1_t0] Creating new HDF5 data store "~/[directory]/r1_t0_lib.h5"
2019-02-20 19:51:12,666 [r1_t0] No existing calculated values in file
2019-02-20 19:51:12,666 [r1_t2] Creating new HDF5 data store "~/[directory]/r1_t2_lib.h5"
2019-02-20 19:51:12,668 [r1_t2] No existing calculated values in file
2019-02-20 19:51:12,668 [r1_t4] Creating new HDF5 data store "~/[directory]/r1_t4_lib.h5"
2019-02-20 19:51:12,669 [r1_t4] No existing calculated values in file
2019-02-20 19:51:12,669 [r1_t6] Creating new HDF5 data store "~/[directory]/r1_t6_lib.h5"
2019-02-20 19:51:12,671 [r1_t6] No existing calculated values in file
2019-02-20 19:51:12,671 [r1_t8] Creating new HDF5 data store "~/[directory]/r1_t8_lib.h5"
2019-02-20 19:51:12,672 [r1_t8] No existing calculated values in file
2019-02-20 19:51:12,673 [r1_t12] Creating new HDF5 data store "~/[directory]/r1_t12_lib.h5"
2019-02-20 19:51:12,675 [r1_t12] No existing calculated values in file
2019-02-20 19:51:12,675 [r1_t16] Creating new HDF5 data store "~/[directory]/r1_t16_lib.h5"
2019-02-20 19:51:12,676 [r1_t16] No existing calculated values in file
2019-02-20 19:51:12,677 [r1_t18] Creating new HDF5 data store "~/[directory]/r1_t18_lib.h5"
2019-02-20 19:51:12,678 [r1_t18] No existing calculated values in file
2019-02-20 19:51:12,680 [SL1SL2_1] Counting for each time point (variants)
2019-02-20 19:51:15,221 [r1_t0] Converting raw variants counts to main counts
2019-02-20 19:51:16,893 [r1_t0] Counted 1319436 variants (219842 unique) after query
2019-02-20 19:51:16,895 [r1_t0] Counting synonymous variants
2019-02-20 19:51:34,243 [r1_t0] Counted 1319436 synonymous (153176 unique)
2019-02-20 19:51:35,192 [r1_t2] Converting raw variants counts to main counts
2019-02-20 19:51:35,803 [r1_t2] Counted 934148 variants (111982 unique) after query
2019-02-20 19:51:35,805 [r1_t2] Counting synonymous variants
2019-02-20 19:51:35,829 [Test] Invalid coding variant string.

Any idea what format I should use? Would the following work (where I use the actual sequence that is coding sequence in the seqlib input + the mutation)?

     Count
_wt    ##
_sy    ##
[nucleotide sequence 1]    ##
[nucleotide sequence 2]    ##
[nucleotide sequence 3]    ##
...

Answer 1 · 2019-02-21T04:22:30.000Z

I fixed the problem by changing the Seqlib type from basic to identifier only.