jinworks/CellChat

How do you handle merging samples across different tissue arrays, leading to redundant coordinates?

Opened this issue · 4 comments

Hello, how does CellChat handle redundancy in coordinates when using a combined dataset across multiple tissue arrays? For example I am working with CosMx data. I have specified samples as a string combining the flow cell name and the FOV to allow CellChat to treat each FOV and flow cell as a separate sample. When I run CellChat on a single tissue array with a non-redundant coordinate system, I get significant results. However, when I run CellChat on a merged dataset consisting of multiple tissue arrays with a redundant coordinate system, I get zero results. @sqjin

sqjin commented

@AmosFong1 Have you solved this issue? I think you should assign different cell barcodes to different samples, and provide the batch labels.

Hi @sqjin, I have not resolved this issue yet. I have already assigned unique barcodes to each sample, and provided batch labels as column samples. I noticed the issue is that computeCellDistance() returns all NA values when there are redundant coordinates. Additionally, inspecting the cell chat object after running with redundant coordinates, all the cell distances stored are represented as NAs.

I have a hacky solution where I shift cells from different TMAs by a number * 10e6.

Below is my current working implementation:

# write function
cellchat <- function(seurat_object, name) {
  # get counts
  counts <- GetAssayData(seurat_object, layer = "data", assay = "RNA") 
  
  # get meta data
  meta_data <- data.frame(labels = factor(seurat_object$labels), samples = factor(seurat_object$samples))
  rownames(meta_data) <- colnames(counts)

  # get coordinates
  coordinates <- select(seurat_object@meta.data, CenterX_global_px, CenterY_global_px, flow_cell_name)
  idx_dict <- distinct(coordinates, flow_cell_name)
  idx_dict <- mutate(idx_dict, idx = row_number())
  coordinates <- left_join(coordinates, idx_dict, by = "flow_cell_name")
  coordinates <- mutate(coordinates, CenterX_global_px = CenterX_global_px + (idx - 1) * 1e6)
  coordinates <- mutate(coordinates, CenterY_global_px = CenterY_global_px + (idx - 1) * 1e6)
  coordinates <- select(coordinates, CenterX_global_px, CenterY_global_px)
  
  # get spatial factors
  ratio = 0.121
  cell_distances <- list()
  for (i in unique(seurat_object$flow_cell_name)) {
    m <- filter(seurat_object@meta.data, flow_cell_name == i)
    c <- select(m, CenterX_global_px, CenterY_global_px)
    cell_distances[[i]] = computeCellDistance(c)
  }
  min_distances <- lapply(cell_distances, min)
  spot_size <- min(unlist(min_distances)) * ratio
  spatial_factors = data.frame(ratio = rep(ratio, length(unique(seurat_object$samples))), tol = rep(spot_size / 2, length(unique(seurat_object$samples))))
  rownames(spatial_factors) <- unique(seurat_object$samples)
  
  # create cellchat object
  cellchat_object <- createCellChat(object = counts, meta = meta_data, group.by = "labels", datatype = "spatial", coordinates = coordinates, spatial.factors = spatial_factors)
  
  # add database
  cellchat_object@DB <- subsetDB(CellChatDB.human, search = c("Secreted Signaling", "Cell-Cell Contact"))
  
  # subset cellchat object
  cellchat_object <- subsetData(cellchat_object)

  # identify over expressed genes
  cellchat_object <- identifyOverExpressedGenes(cellchat_object)

  # identify over expressed interactions
  cellchat_object <- identifyOverExpressedInteractions(cellchat_object)
  
  # compute communication probability
  cellchat_object <- computeCommunProb(cellchat_object, type = "truncatedMean", distance.use = FALSE, interaction.range = 250, scale.distance = NULL, contact.range = 100)
  
  # filter communication
  cellchat_object <- filterCommunication(cellchat_object, min.cells = 10)
  
  # subset communication
  communications <- subsetCommunication(cellchat_object)
  
  # add evidence
  communications <- mutate(communications, evidence = gsub(",", " ", evidence))
  
  # save communication
  write.table(communications, file = file.path(project_dir, "data", "cellchat", paste0("cosmx_", tolower(gsub("[^a-zA-Z]", "", name)), "_communications.csv")), quote = FALSE, sep = ",", row.names = FALSE, col.names = TRUE)
}
sqjin commented

@AmosFong1 Did you mean that there are some cells that have the exact same coordinates across different FOVs? Are you working on the CoxMx data? If so, I suggest to set contact.range = 10 instead of contact.range = 100

@sqjin Thanks for the suggestion I will use contact.range = 10. Yes some of the cells have the exact same coordinates across different FOVs, because my dataset incorporates ~14 different slides.