meyer-lab/tHMM

Censorship discussion can be in results and much more succinct

Closed this issue · 3 comments

We can cover this literature, but censorship is a minor point in our overall goal.

There are two difficulties associated with single-cell lineage data. In each lineage tree, edges are representative of the time duration that takes a particular cell to divide. The clades that contain cells that divide faster are more represented in the population and these overrepresented colonies introduce survivorship bias to the sample data. If this bias is disregarded, the model would fail to converge to the correct answer. In some studies the survivorship bias is addressed by adopting different strategies to sample from the lineage trees such as forward sampling, where starting from the root cell they track cells over generations by randomly selecting a daughter cell at each division, retrospective sampling which traverses the tree from a leaf cell to the root cell, and tree sampling where they removed leaf cells [@doi:10.1101/488981; @doi:10.3389/fphy.2018.00064]. Another difficulty is the time censorship. At the final time point of an experiment, cells that are still alive will die/divide at some point after the cut off time of the experiment and inevitably results in missing information about their lifetime and fate. On the other hand, there are cells that have been born at an unkonwn time before the beginning of the experiment. In thses two cases, due to the finite experiment duration the data is right- and left-censored, respectively. Time censoreship needs to be properly handled such that censored values are treated differently from uncensored values. In the process of fitting, we intend to find the most likely distribution source of the random values, i.e., measured phenotypes. To address the time censoreship issue, Kuchen and colleauges truncated the lineages after the last generations [@doi:10.7554/eLife.51002]. We employed the survival distribution function specifically for censored samples in the data and fitted the original probability distributions to uncensored samples, which resulted in accurate and correct convergence of the model.

Two challenges associated with single-cell data are survivorship bias and time-censorship. The first happens when faster-growing cells outgrow slower growing cells and are over-represented in the culture. To handle this challenge, Nakashima et al. have adopted different strategies where they sample from the lineage trees in a forward and retrospective manner, and as an ultimate approach, they remove leaf cells and keep the fully-observed cells [@doi:10.1101/488981; @doi:10.3389/fphy.2018.00064]. The second difficulty is handling time censorship in finite-time experiments where we have missing information about exactly before and after the experiment. To address this, Kuchen and colleagues truncated the lineages after the last generations [@doi:10.7554/eLife.51002] and simply removed unfinished cells. We employed the survival distribution function specifically for censored samples in the data and fitted the original probability distributions to uncensored samples, which resulted in accurate and correct convergence of the model.

@aarmey is it still long?

Much better.

There is a paragraph in methods that also explains this. I should make one of them shorter to avoid repetition.