Conditional density-based analysis

Question

Conditional density-based analysis

maxentile opened this issue 10 years ago · 0 comments

So I took a closer look at the DREMI paper Conditional density-based analysis of T cell signaling in single-cell data (Krishnaswamy et al., Science, 2014)

and I'd be interested in your feedback on a few things.

Summary

I overall like their approach / way of thinking about the problem, but I'm not completely sold on most of the details past page 2.

Problem setting

They think of signaling networks computationally, where each protein computes a stochastic function of other proteins in the network, and each single-cell measurement provides an "input-output" example of this relation. A main learning task is then to estimate what "function" each protein is computing on the basis of the others.
The main challenge is that there may be distinct sub-populations of cells in your dataset, each with different "computational" properties, and some populations may be much rarer than others. If you try to model the full joint distribution, your model will more strongly penalize deviation for the most abundant cell types, and will not sensitively model the properties of rare sub-types.
In other words, they want to estimate the functional dependence of protein Y given X from (x,y) examples, and they want to be sensitive in regions of low X density. They can do this by estimating the conditional distribution p(Y | X).

Approach

Since the main issue they want to solve is insensitivity to rare sub-types, they essentially just normalize for the local number density of X:

This produces a visualization of the functional dependence of the two proteins and allows them to estimate the mutual information:
- DREMI - "Density resampled estimate of mutual information"
  - to quantify the strength of influence of one protein on another
- DREVI - "Conditional-Density rescaled visualization"
  - "to visualize and characterize the edge-response function underlying their molecular interaction"

Major issues

Their process involves repeated steps of density estimation, discretization onto a grid, and downsampling data in low- and high-density regions: how much bias and variance do these steps introduce? (A lot, I would imagine.)
In Box 2: how do you choose epsilon? How do you choose the discretization scale? Also, they resort to density-dependent downsampling again. But at least here you should be able to do this step many times and average over / quantify the sample variability.
Figure 2a is wrong: their diagram says that entropy measures the "spread" of the data (max - min), but that's clearly not what entropy is measuring: you could have a bimodal distribution that's very sharply peaked at the extreme values of the data and thus have maximum "spread" but near-minimum entropy. Implication for the method: they always try to fit a single line/ curve through their density estimate, but there are cases where a the conditionals are not unimodal (i.e. you should fit many curves).
When they fit functions to their plots, they either fit lines, sigmoids, or splines. Uncertainty quantification is very important here, so these should definitely be bayesian fits, maybe using gaussian processes?

Minor issues

In Box 1, point (iii), they compute marginals by normalizing w.r.t. the maximum value in each column, but shouldn't this normalizing constant be a sum over the values in the column? The picture in figure 2b explicitly indicates that they're normalizing the total number per column.
They make a lot of sweeping statements about the performance of their method, even in MATLAB code comments. One such statement in the paper that I didn't understand was that "DREMI works well for data that is well distributed across the range of X and Y" -- I have no idea what "works well" means here, or what happens when the data isn't "well distributed," or even what "well distributed" means. Help?
The comparisons in Figure 5 look a bit dubious: their algorithm qualitatively disagrees with every other method they compare it to (adaptive mutual information, maximal information coefficient, pearson correlation). They put a green box around "the point with the strongest relationship upon visual inspection of the DREVI plot" (!) and then draw red X's on every other method:
- Also, the conditional distribution of IkBa | pMAPKAPKII in time-course B is clearly bimodal at late time points, which should be a huge problem for their "fit a single curve" strategy.
P. 17 in the supplement made me a bit uncomfortable: the figure basically says they can get very low RMSE values if they just throw out some data?

Questions for team

Can someone help me understand their density estimator on p. 11? They seem to have created something that looks a bit like Gaussian KDE, but based on the heat equation?
Compare with better measures of mutual information? I'm not familiar with Adaptive-MI, but the MIC has a lot of known problems. There are two methods in particular that look better suited:
- Estimates using KDE: http://bechtel.colorado.edu/~balajir/my-papers/physical-review-E.pdf
- MLE by direct optimization: http://jmlr.org/proceedings/papers/v4/suzuki08a/suzuki08a.pdf
Also, can we compare their estimates of conditional density with more direct estimators of the conditional density? This looks the most promising: http://www.cc.gatech.edu/~isbell/papers/isbell-density-2007.pdf
Since, at the end they just fit curves through the resampled points anyway, why not cast this as a regression problem to start with? The only constraint is that they want the regressed function to give equal weight to regions of high X density and regions of low X density. You could imagine just doing a univariate estimate of X density, and repeatedly doing curve fits to density-downsampled data, for example.
Alternately, instead of doing a density-normalized fit on the full dataset, can you somehow do rare sub-type identification first, and then straightforwardly estimate a separate function f_i(X)=Y for each subtype i?