popgenmethods/SINGER

Interpreting parallel SINGER Output: Time Units and Mapping Nodes to VCF IDs

santiago1234 opened this issue · 2 comments

Hi @YunDeng98 ,

I have successfully executed parallel SINGER and have a few questions regarding the interpretation and further processing of the output files.

I used the following command.

parallel_singer -vcf {chr_22.vcf} \
           -Ne 1e4 \
           -m 1.2e-8 \
           -output results/trees/mex_chr22 \
           -n 1000 \
           -thin 20

Question:

  • The execution generated multiple tree sequence files (chr22_0.trees to chr22_39.trees). Each of the tree sequence files represents a sample from the posterior distribution?

Interpreting Tree Sequence Summary:

Upon inspecting a sample tree sequence file (e.g, chr22_36.trees) with tskit, the summary is as follows:

import tskit
ts = tskit.load('results/trees/chr22_36.trees')
ts
Tree Sequence Summary
---------------------
Trees:           197,329
Sequence Length: 51,000,000.0
Time Units:      Unknown
Sample Nodes:    474
Total Size:      44.3 MiB

Table Details
-------------
Table       | Rows   | Size     | Has Metadata
------------|--------|----------|-------------
Edges       | 782,759| 23.9 MiB | No
Individuals | 0      | 24 Bytes | No
Migrations  | 0      | 8 Bytes  | No
Mutations   | 154,161| 5.4 MiB  | No
Nodes       | 211,505| 5.6 MiB  | No
Populations | 0      | 8 Bytes  | No
Provenances | 0      | 16 Bytes | No
Sites       | 141,527| 3.4 MiB  | No
  • What are the units of time used in the tree sequence files? Are they in generations?
  • Mapping Nodes to Individuals: How can I map the nodes within the tree sequence to the original individual IDs from the input VCF file? Specifically, does the order of the nodes returned by ts.samples() correspond to the order of sample IDs within the VCF file?

Thanks again for your help :)

Hi @santiago1234, to your questions: (1) the units of time are in generations (2) The nodes appear in the same order as in the vcf file, that is, leaf node 0 and 1 are the first individual in the vcf, and leaf node 2 and 3 are the second individual, etc.

thanks!