How do you handle merging samples across different tissue arrays, leading to redundant coordinates?
Opened this issue · 4 comments
Hello, how does CellChat handle redundancy in coordinates when using a combined dataset across multiple tissue arrays? For example I am working with CosMx data. I have specified samples as a string combining the flow cell name and the FOV to allow CellChat to treat each FOV and flow cell as a separate sample. When I run CellChat on a single tissue array with a non-redundant coordinate system, I get significant results. However, when I run CellChat on a merged dataset consisting of multiple tissue arrays with a redundant coordinate system, I get zero results. @sqjin
@AmosFong1 Have you solved this issue? I think you should assign different cell barcodes to different samples, and provide the batch labels.
Hi @sqjin, I have not resolved this issue yet. I have already assigned unique barcodes to each sample, and provided batch labels as column samples. I noticed the issue is that computeCellDistance()
returns all NA values when there are redundant coordinates. Additionally, inspecting the cell chat object after running with redundant coordinates, all the cell distances stored are represented as NAs.
I have a hacky solution where I shift cells from different TMAs by a number * 10e6.
Below is my current working implementation:
# write function
cellchat <- function(seurat_object, name) {
# get counts
counts <- GetAssayData(seurat_object, layer = "data", assay = "RNA")
# get meta data
meta_data <- data.frame(labels = factor(seurat_object$labels), samples = factor(seurat_object$samples))
rownames(meta_data) <- colnames(counts)
# get coordinates
coordinates <- select(seurat_object@meta.data, CenterX_global_px, CenterY_global_px, flow_cell_name)
idx_dict <- distinct(coordinates, flow_cell_name)
idx_dict <- mutate(idx_dict, idx = row_number())
coordinates <- left_join(coordinates, idx_dict, by = "flow_cell_name")
coordinates <- mutate(coordinates, CenterX_global_px = CenterX_global_px + (idx - 1) * 1e6)
coordinates <- mutate(coordinates, CenterY_global_px = CenterY_global_px + (idx - 1) * 1e6)
coordinates <- select(coordinates, CenterX_global_px, CenterY_global_px)
# get spatial factors
ratio = 0.121
cell_distances <- list()
for (i in unique(seurat_object$flow_cell_name)) {
m <- filter(seurat_object@meta.data, flow_cell_name == i)
c <- select(m, CenterX_global_px, CenterY_global_px)
cell_distances[[i]] = computeCellDistance(c)
}
min_distances <- lapply(cell_distances, min)
spot_size <- min(unlist(min_distances)) * ratio
spatial_factors = data.frame(ratio = rep(ratio, length(unique(seurat_object$samples))), tol = rep(spot_size / 2, length(unique(seurat_object$samples))))
rownames(spatial_factors) <- unique(seurat_object$samples)
# create cellchat object
cellchat_object <- createCellChat(object = counts, meta = meta_data, group.by = "labels", datatype = "spatial", coordinates = coordinates, spatial.factors = spatial_factors)
# add database
cellchat_object@DB <- subsetDB(CellChatDB.human, search = c("Secreted Signaling", "Cell-Cell Contact"))
# subset cellchat object
cellchat_object <- subsetData(cellchat_object)
# identify over expressed genes
cellchat_object <- identifyOverExpressedGenes(cellchat_object)
# identify over expressed interactions
cellchat_object <- identifyOverExpressedInteractions(cellchat_object)
# compute communication probability
cellchat_object <- computeCommunProb(cellchat_object, type = "truncatedMean", distance.use = FALSE, interaction.range = 250, scale.distance = NULL, contact.range = 100)
# filter communication
cellchat_object <- filterCommunication(cellchat_object, min.cells = 10)
# subset communication
communications <- subsetCommunication(cellchat_object)
# add evidence
communications <- mutate(communications, evidence = gsub(",", " ", evidence))
# save communication
write.table(communications, file = file.path(project_dir, "data", "cellchat", paste0("cosmx_", tolower(gsub("[^a-zA-Z]", "", name)), "_communications.csv")), quote = FALSE, sep = ",", row.names = FALSE, col.names = TRUE)
}
@AmosFong1 Did you mean that there are some cells that have the exact same coordinates across different FOVs? Are you working on the CoxMx data? If so, I suggest to set contact.range = 10
instead of contact.range = 100