niemasd/FAVITES

Enquiry about backward-in-time models

Hc1023 opened this issue · 5 comments

FAVITES Version

1.2.7

Environment

Singularity on HPC

Configuration File

{
"ContactNetwork": "NetworkX",
"ContactNetworkGenerator": "File",
"Driver": "Default",
"EndCriteria": "TransmissionFile",
"Logging": "FileSTDOUT",
"NodeAvailability": "Perfect",
"NodeEvolution": "VirusTreeSimulator",
"NumBranchSample": "Single",
"NumTimeSample": "Once",
"PostValidation": "Dummy",
"SeedSelection": "TransmissionFile",
"SeedSequence": "NoSeqs",
"SequenceEvolution": "NoSeqs",
"Sequencing": "NoSeqs",
"SourceSample": "Random",
"TimeSample": "Uniform",
"TransmissionNodeSample": "TransmissionFile",
"TransmissionTimeSample": "TransmissionFile",
"TreeNode": "Simple",
"TreeUnit": "Same",
"contact_network_file": "/share/home/jianglab/huangsisi/usr/test/out134/tree/contact_network.txt.gz",
"transmission_network_file": "/share/home/jianglab/huangsisi/usr/test/out134/tree/transmission_network.txt.gz",
"out_dir": "/share/home/jianglab/huangsisi/usr/test/out134/tree",
"gemf_path": "GEMF",
"grinder_path": "grinder",
"hmmemit_path": "hmmemit",
"java_path": "/usr/bin/java",
"nw_rename_path": "nw_rename",

"vts_growthRate": 0,
"vts_max_attempts": 1000,
"vts_model": "constant",
"vts_n0": 1,
"vts_t50": -99999,
"tree_mutation_rate": 0.0008,
"end_time": 100/365

}

Unexpected Behavior

I have successfully simulated the contact network and transmission network using FAVITES. However, I do not know how to run backward-in-time models of evolution to generate the viral time-based phylogeny using VirusTreeSimulator. I have placed the contact_network.txt.gz and transmission_network.txt.gz to the corresponding output directory /share/home/jianglab/huangsisi/usr/test/out134/tree/ and run FAVITES. It failed the run and gave FAVITES.log that looked like this

image

It stopped at Sampling patients in time. I suppose I do not understand how exactly the pipeline works. How could I generate the viral time-based phylogeny using VirusTreeSimulator (preferably with an example)?

Can you also attach the contact network and transmission network files you are using?

Yes.
contact_network.txt.gz
transmission_network.txt.gz

In fact, they were also generated by FAVITES from previous successfully established simulation.
CONFIG_b.json.txt

Can you clarify what you mean by "It stopped at Sampling patients in time"? I just tested out the CONFIG you shared in your original post using the contact_network.txt.gz and transmission_network.txt.gz files you attached, and FAVITES ran successfully to completion. It did indeed take a while to finish the Sampling patients in time step (5-10 minutes or so), but it eventually finished. Here is the output folder:

output.zip

Thank you so much for your kind help!
It indeed stopped that day for I submitted the job to the server and failed the run. And the log file stopped at Sampling patients in time. Today I submitted the same job but succeeded, and the log file was complete. I guess there were elves living on the server! Sorry to bother you!

image
image

And there are some other questions that I do not understand.

  • So to simulate any viral infectious process, the same pHMM was used to generate the seed viral sequences?
  • And what exactly happens during a backward simulation? All (currently) infected nodes of the infectious network were assigned a simulated evolved viral sequence accounting for the time of evolution?
  • And the infected individuals remained for labeling internal nodes were those removed (no longer carry a virus)?

Could you help explain them briefly or give me some clues?
Many thanks!

So to simulate any viral infectious process, the same pHMM was used to generate the seed viral sequences?

I'm not sure I understand the question. The only place FAVITES potentially uses a profile HMM is in various SeedSequence module implementations to randomly generate a seed viral genome that "looks like" a database of known virus genomes represented as a profile HMM. Profile HMMs are not used in the transmission simulation

And what exactly happens during a backward simulation? All (currently) infected nodes of the infectious network were assigned a simulated evolved viral sequence accounting for the time of evolution?

The "backward-in-time" simulation refers specifically to tree evolution (i.e., simulating a viral evolutionary tree along the transmission network), not sequence evolution. I would suggest reading into coalescent models of evolution. There are some good textbooks on the subject, or you can seek out lecture slides/notes from the internet

And the infected individuals remained for labeling internal nodes were those removed (no longer carry a virus)?

I'm not sure what you mean by "remained for labeling internal nodes". In general, however, there is nothing special about individuals who were removed (e.g. because they recovered/died) aside from the fact that, if they are sampled (i.e., they appear as leaves in the phylogeny and thus as sequences), the sampling times must have occurred during the window of time that the individuals were alive and infected