Guidelines to choose the best parameters for bivariate analysis using Visium HD data
Rafael-Silva-Oliveira opened this issue · 5 comments
Hello again!
I've been trying out the bivariate approach with the Visium HD data, and I've been testing some of the parameters, mainly bandwidth and max_neighbours:
li.ut.spatial_neighbors(
adata,
bandwidth=1000,
cutoff=0.1,
kernel="gaussian",
set_diag=False,
max_neighbours=500,
)
li.mt.bivariate(
adata,
layer="lognorm_counts",
resource_name="consensus", # NOTE: uses HUMAN gene symbols!
local_name="cosine", # Name of the function
global_name="morans", # Name global function
n_perms=75, # Number of permutations to calculate a p-value
mask_negatives=False, # Whether to mask LowLow/NegativeNegative interactions
add_categories=True, # Whether to add local categories to the results
nz_prop=0.01, # Minimum expr. proportion for ligands/receptors and their subunits
use_raw=False,
verbose=True,
)
And the results from the spatial plot look like this:
With max neighbors 500 and bandwidth 1000:
Max neighbors 100 and bandwidth 250:
Now, on both of them they naturally have circular shaped regions due to the settings, but I'd just like to ask some guidelines given the following:
- Each "bin" or "spot" seen in the picture is an actual cell (processed by Bin2Cell)
- Each spot is assigned a cell type label (not proportion)
Given that these methods can take a bit to get to the results, is there any other factors I should consider to adjust so that things look a bit more "smooth" like the tutorials seen on LIANA+ documentation?
Would it be a good idea to test for the jaccard index considering the actual labelled categories? Should I choose max_neighbors of 1-2 instead and bandwidth of 50-100?
Thanks once again for the support :)
For reference:
Now, the bandwidth is I assume is being calculated in pixels, i.e. the x,y units stored in adata.obsm['spatial']? And pixels in Visium HD images should correspond to something between 0.5 to 2 microns, I guess. If you can check this then you can calculare how many microns are per pixel, and then a commonly used assumption for distance of diffusion is ~100 microns.
Or alternatively, you could set it to 10 or 20 cells.
Both cases are obviously oversimplifications since diffusion depends on the ligand, and you also have membrane-bound interactions. Also, this is expression and not proteins so, to me, the decision is a bit arbitrary and case-to-case dependent.
Hope this helps :)
Now, the bandwidth is I assume is being calculated in pixels, i.e. the x,y units stored in adata.obsm['spatial']? And pixels in Visium HD images should correspond to something between 0.5 to 2 microns, I guess. If you can check this then you can calculare how many microns are per pixel, and then a commonly used assumption for distance of diffusion is ~100 microns.
Or alternatively, you could set it to 10 or 20 cells.
Both cases are obviously oversimplifications since diffusion depends on the ligand, and you also have membrane-bound interactions. Also, this is expression and not proteins so, to me, the decision is a bit arbitrary and case-to-case dependent.
Hope this helps :)
Thank you for the swift reply once again!
Indeed, the original Visium HD dataset would be seen as the coordinates of each bin, but given I've processed with Bin2Cell, these coordinates got "aggregated" in some way, so I'll have to confirm that :)
Just by following your suggestions, I got to these plots, which seem to me a bit more of what we'd like to see with this type of data!
I've also changed to the jaccard index instead of cosine, as it might be better for categorical data, but I'll see with cosine too
Thanks again!
Hello again! I don't think this would require opening a new issue, but whenever I run this part of the tutorial (the decoupleR component of the bivariate analysis using LIANA+):
# Estimate cosine similarity
li.mt.bivariate(
mdata,
x_mod="comps",
y_mod="tf",
local_name="cosine",
interactions=interactions,
mask_negatives=True,
add_categories=True,
x_use_raw=False,
y_use_raw=False,
nz_prop=0.01,
xy_sep="<->",
x_name="celltype",
y_name="tf",
)
My terminal is killed (I'm assuming because of memory errors, no other warnings);
I have 1100 interactions, 22 cell types (where I converted from string label to one-hot encoded - Instead of being "proportions", here we have 1 spot = 1 cell, so 1 for the cell type it was predicted as for that given cell and 0 for all the others) and 520k cells
I tried reducing to just the top 5 highly variable TFs, but still crashed
Thanks again :)
You could try setting add categories and mask negatives to false. Perhaps, this is causing the issue. If it is, I could have another look as there might be a way to make it work also on a laptop.
Daaniel
You could try setting add categories and mask negatives to false. Perhaps, this is causing the issue. If it is, I could have another look as there might be a way to make it work also on a laptop.
Daaniel
Hey Daniel, I tried that approach and still crashed, I haven't checked the underlying code yet, I can also have a look and see where it might be crashing