Interpreting parallel SINGER Output: Time Units and Mapping Nodes to VCF IDs
santiago1234 opened this issue · 2 comments
santiago1234 commented
Hi @YunDeng98 ,
I have successfully executed parallel SINGER and have a few questions regarding the interpretation and further processing of the output files.
I used the following command.
parallel_singer -vcf {chr_22.vcf} \
-Ne 1e4 \
-m 1.2e-8 \
-output results/trees/mex_chr22 \
-n 1000 \
-thin 20
Question:
- The execution generated multiple tree sequence files (
chr22_0.trees
tochr22_39.trees
). Each of the tree sequence files represents a sample from the posterior distribution?
Interpreting Tree Sequence Summary:
Upon inspecting a sample tree sequence file (e.g, chr22_36.trees) with tskit, the summary is as follows:
import tskit
ts = tskit.load('results/trees/chr22_36.trees')
ts
Tree Sequence Summary
---------------------
Trees: 197,329
Sequence Length: 51,000,000.0
Time Units: Unknown
Sample Nodes: 474
Total Size: 44.3 MiB
Table Details
-------------
Table | Rows | Size | Has Metadata
------------|--------|----------|-------------
Edges | 782,759| 23.9 MiB | No
Individuals | 0 | 24 Bytes | No
Migrations | 0 | 8 Bytes | No
Mutations | 154,161| 5.4 MiB | No
Nodes | 211,505| 5.6 MiB | No
Populations | 0 | 8 Bytes | No
Provenances | 0 | 16 Bytes | No
Sites | 141,527| 3.4 MiB | No
- What are the units of time used in the tree sequence files? Are they in generations?
- Mapping Nodes to Individuals: How can I map the nodes within the tree sequence to the original individual IDs from the input VCF file? Specifically, does the order of the nodes returned by
ts.samples()
correspond to the order of sample IDs within the VCF file?
Thanks again for your help :)
YunDeng98 commented
Hi @santiago1234, to your questions: (1) the units of time are in generations (2) The nodes appear in the same order as in the vcf file, that is, leaf node 0 and 1 are the first individual in the vcf, and leaf node 2 and 3 are the second individual, etc.
santiago1234 commented
thanks!