Batch aware filtering of interactions in lrs_to_views
Closed this issue · 4 comments
Is your feature request related to a problem? Please describe.
When using lrs_to_views, one can filter interaction based on a minimum required variance across samples. However, if batches are present in the data, this might select mostly batch effects rather than the biological signal we are interested in.
Describe the solution you'd like
I would suggest something like the batch_key approach of the highly variable gene selection used in scanpy. Empty views, or views with only one batch could be dropped simultaneously.
I am still wondering whether one would prefer separating the object creation from the object filtering. Considering how difficult it is to use the MuData object currently, I would suggest to do the filtering concurrently but maybe others have suggestions one way or the other
Hi @demian1,
I agree with the idea that maybe doing filtering independently makes sense. Perhaps, besides the obvious of setting everything to (e.g. 0 or infinity depending on the filter), another simpler solution could be e.g. if one passes 'None' to all filter parameters then no filtering is carried out.
Regarding the variance filter, my main motivation when I implemented was that variables with 0 variances were causing MOFA+ to crash, but you are right that if the variance across LRs is being driven by batch then this parameter might be suboptimal. I agree that the scanpy approach is a good way to address this, also quite simple to implement.
However, since we don't rank, it could be just the union or intersection of highly-variable interactions across the batches. Or alternatively the average variance across batches? This I would need to think a bit about, but I will likely go for the latter approach (i.e. mean).
Empty views, or views with only one batch could be dropped simultaneously.
This is also a really good point.
Please let me know your thoughts, I will aim to have the proposed solution implemented in the next update :)
Sounds good!
PS. To implement batch_key
wrt mean var + a new parameter for var_min_nbatches
. + Add info to .var
implemented in 07c0f05