TrigosTeam/SPIAT

Help with marker_prediction_plot

Closed this issue · 5 comments

Hello Yuzhou and esteemed members of the Trigos Team,

I want to begin by congratulating you on the remarkable SPIAT package. As a master's student with limited bioinformatics experience, I apologize in advance if the issues I raise may seem straightforward to resolve or are not clearly presented.

Currently, I am engaged in a project that involves comparing the results obtained from our pathologic anatomy department. I have a series of 100 TIFFs from lung tumors, which were captured using multiplex imaging and analyzed with QuPath. However, I am encountering difficulties in quality control.

Here are the specific challenges I am facing:

  1. When using the predict_phenotypes function, I am wondering if there is a way to exclude the tumor marker (CK) from the predicted phenotypes, similar to how the nuclear marker is excluded by default. I have already utilized tumor_marker to define CK, and it has been excluded from baseline_markers. However, attempting to exclude it from markers_to_phenotype results in CK being excluded from other downstream analyses, such as marker_prediction_plot.

  2. While using the marker_prediction_plot to assess the quality of predicted phenotypes, I encounter the following error, which seems to be dependent on the number of cells being processed (currently 8304, which I believe is not too large):

marker_prediction_plot(predicted_image, marker = "PD1")

Error in .Primitive("[")(c(2.341, 0.822, 0.048, 0.5286, 0.0752, 0.1887, :
(subscript) logical subscript too long

I sincerely appreciate any insights and assistance you can provide in resolving these issues. Thank you for your time and support.

Hi @CPUriarte ,

Thank you for your interest in using the tool.

For the first question, what is the reason for excluding the tumour marker from markers_to_phenotype argument? If the tumour marker is not predicted, it will not be included in the marker_prediction_plot as there is no prediction value for the tumour markers.

For the second question, I wonder if the issue occurred because the original phenotypes were not present in the image object. Could you post the code that includes the phenotype prediction and the data generated from that (the column names should be sufficient)?

When excluding the tumor marker from the analysis, it is automatically omitted from predictions in downstream analyses, such as the marker_prediction_plot. However, if the tumor marker is included, a misclassification occurs for TILs present in the tissue, erroneously identifying them as "Immune_1, Immune_2, Tumour_marker," even though the latter does not exist. As the current package only allows for individual cell removal, this process results in a loss of valuable information regarding those TILs.

Regarding the second issue, you were correct; my initial phenotype was not present in the image object. I have since attempted using a phenotype that is indeed present, but unfortunately, I encountered the same error.

# SELECT IMAGE SUBSET
selected_image <- "TMA1_1A_3901.tif"
selected_data <- subset(raw_measurements, Image %in% selected_image)
selected_data <- selected_data %>%
    select(-1)

# FORMATTING
x_coord <- as.numeric(t(selected_data)[1, ]) # Get X coordinates
y_coord <- as.numeric(t(selected_data)[2, ]) # Get Y coordinates

filt_measurements <- t(selected_data[, -c(1:2)]) # Delete X/Y coordinates from intensity matrix
filt_measurements <- as.data.frame(filt_measurements)
filt_measurements <- sapply(filt_measurements, as.numeric)

colnames(filt_measurements) <- str_c("CELL", as.character(1:ncol(filt_measurements)), sep = "_")
rownames(filt_measurements) <- c("DAPI", "PD1", "CD8", "CD3", "TIM3", "LAG3", "CK")
phenotype <- rep("", ncol(filt_measurements))

# CREATE SPATIAL EXPERIMENT
general_format_image <- format_image_to_spe(format = "general", 
   phenotypes = phenotype, 
   intensity_matrix = filt_measurements, 
   coord_x = x_coord, 
   coord_y = y_coord)

predicted_image <- predict_phenotypes(spe_object = general_format_image, 
   thresholds = NULL, 
   tumour_marker = "CK", 
   nuclear_marker = "DAPI", 
   baseline_markers = c("PD1", "CD8", "CD3", "TIM3", "LAG3"), 
   reference_phenotypes = FALSE)
colnames(predicted_image)[1:10]
## [1] "CELL_1"  "CELL_2"  "CELL_3"  "CELL_4"  "CELL_5"  "CELL_6"  "CELL_7"  "CELL_8"  "CELL_9" 
## [10] "CELL_10"

predicted_image$Phenotype[1:10]
## [1] "CK"             "PD1,CD8,CD3,CK" "CK"             "CK"             "CK"            
## [6] "CK"             "CK"             "CK"             "CK"             "CK" 

Sincerely thankful for your guidance,

Cyril.

Hi @CPUriarte , apologise for the late reply.

For the first issue, there shouldn't be a misclassification even if the tumour marker is excluded. I think for both issues if you could provide the data, the code and the error messages it would be easier for me to identify the problems. If you can not share the data, I suggest you send me the code and error messages along with the column names of your data. This is my email, yuzhou.feng@petermac.org

I have sent you all required information, thanks for your help!

I will close the issue with a note - the function predict_phenotypes() assumes "tumour_marker" is the majority phenotype in the image. If this is not the case for a specific dataset, any marker that appears on most of the cells can be considered as "tumour_marker". This information will be updated in the documentation.