Phelimb/atlas

Change threshold for outputting paths in `walk`

Opened this issue · 3 comments

        if v["len_dna"] == N + 1 and v["dna"][-1] == "*":
            keep_paths[k] = v

is too strict a criteria. this will not output any paths with a single base deletion (frame shift).

Inserting a stop codon often returns nothing. When walk does output path the translation is broken (as expected).

@ronald-jaepel could you possibly point me to an example where this works and one where it doesn't (or several).

Inserting a 10bp deletion rarely results in a full assembly of a gene in the database. @ronald-jaepel could you also point me to an example of this?

Data is in /data1/projects/ronald_jaepel/atlas_test/Simulation_Products/simulated_reads/$TIMESTAMP/
and /data1/projects/ronald_jaepel/atlas_test/Simulation_Products/generated_genomes/$TIMESTAMP/

Jsons from atlas are in /data1/projects/ronald_jaepel/atlas_test/JSONS/$TIMESTAMP/

for the following experiments:
TIMESTAMP="2016_08_22_1446/" #first deletion of 10 with all families still included
TIMESTAMP="2016_08_23_1156/" #last deletion of 20 without families cml - sul - aac - aad - B
TIMESTAMP="2016_08_23_1500/" #insertion of stop-codon

In each case ecoli_all_families was the initial simulation with one allele of each family and 1 000 000 reads simulated. All others are subsamples of those reads (10 000 - 840 000).