Comments for chapters 5 and 6
pfh opened this issue · 2 comments
Hi Stuart,
Sorry I've been a bit slow about this.
I have two main comments:
-
You use a mixture of "I" and "we". Since this is your work, I think "I" is most appropriate, unless the chapter is from a paper with multiple authors. Maybe check what Di thinks. Relatedly, Liminal is cited as "Lee and Cook". A reviewer seeing this will want to know how much of it is your own work.
-
Section 5.4.1 is a very very high level description of what is going on. I think you need a lot more detail here, and a few diagrams. A reader would not necessarily know that there's a division of labour between web-browser client and R server, or what the shiny programming model is. What communication happens between client and server? What states can the client and server be in? What is the actual sequence of events driving a "reactive stream"?
Some further items (you might have already fixed some of these):
5.1
You could bridge from chapter 4 by saying something like:
"In chapter 4, we looked at coverage of different regions of a genome. One typical thing to do is to estimate the coverage of genes in a genome by RNA-Seq reads. If this is done for multiple biological samples, the result is a high-dimensional dataset, but with a limited number of observations. Modern techniques also allow the many individual cells in a bioligcal sample to be quantified in this way, producing datasets that are both high-dimensional and have many observations."
5.1
"as a animated"
a -> an
5.1
"the tour has been used previously by Wickham, Cook, and Hofmann (2015) and exploring statistical model fits"
and -> for ?
5.1
"there has been relatively few tools"
has -> have
5.2
"a class of DR methods"
methods -> method
5.2
"Cauchy kernel"
Could say that this is the t-distribution with one degree of freedom that was mentioned earlier.
5.2.1
"Like, when using other DR techniques"
Extra comma.
5.3.3
"Whether, points are near or far"
Extra comma.
5.4
Liminal is cited as "Lee and Cook". A reviewer seeing this will want to know how much of it is your own work.
5.4.1
"Generally, the user would set d=2 the tour is visualised as a animated scatter plot."
Missing "and".
6.1
"be come" -> "become"
6.2
"However, it is unclear whether the semantics of our grammar can be extended to data that can not be easily reshaped into long form tidy representations."
This is to do with efficiency? In general, any data can have a tidy representation.
I like this section. Disk based representation and efficient processing is a very interesting gnarly problem, but also somewhat orthogonal to semantics and grammar. Since semantics and grammar were the focus of your thesis, its logical not to explore this and leave it to the future.
Some more notes for chapter 5
5.5
"steps through case studies cases"
5.5.3
"Next explore some simulated"
-> "Next I/we explore..."
5.5.3
Maybe worth describing the type of biological data this simulated data is mimicking (and that PHATE was developed to work with), to motivate this example. In the single-cell data PHATE was designed for, we're seeing branching trajectories of cell differentiation. If cells in the sample are mature, we see only the tips of the branches, which looks like a hierarchical pattern of clustering. (Maybe other data types have different stories that lead to similar patterns in data, or maybe they more resemble "manifolds" and "horseshoes".)
Figure 5.3
"The true data lies 2-d tree like structure consisting of ten branches."
5.5.3
"Figure @Screenshots of the liminal interface applied to tree structured data, a video of the tour animation is available at https://player.vimeo.com/video/439635863. shows that this selection..."
5.5.4
Maybe worth explaining to the reader that this is an example of a real dataset with features similar to case studies 2 and 3.
5.5.4
"For clustering, workflows"
Rogue comma.
5.5.4
"along side"
-> alongside
5.5.4
"we do weighted sample"
5.5.4
"the same subset as before figure 5.7."
-> ... as before (Figure 5.7).
5.5.4
Minor quibble: I notice higher cluster-number points are "on top" of lower cluster-number points. It makes them tend to dominate the tour pane.
Very interesting chapter. I'm full of ideas I want to follow up, but as a single chapter in your wide-ranging thesis this is great.
Hi Paul, thanks so much for your feedback on everything! Regarding your comments on section 5.4.1 I was trying to avoid getting too technical here as there is a lot going on already in this chapter. The minor quibble is something that is a default in vega which I definitely want to amend in future.