BodenmillerGroup/cytomapper

loadImages Subscript Error

Opened this issue · 6 comments

Better error handling would be nice to have.

> loadImages("IMC_oSCC_Angela2024/", "tiff", single_channel = TRUE, on_disk = TRUE, h5FilesPath = getHDF5DumpDir())
Error in normalizeDoubleBracketSubscript(i, x, exact = exact, allow.NA = TRUE,  : 
  subscript is out of bounds

I show the data set's directory structure.

$ ls IMC_oSCC_Angela2024 # A folder for old smokers (O.S.), another for young non-smokers (Y.N.S.).
TMA01_OS  TMA02_YNS
$ ls IMC_oSCC_Angela2024/TMA01_OS
1_A3  1_A5  1_A7  1_B1  1_B3  1_B5  1_B7  1_C1  1_C3  1_C5  1_C7  1_D1  1_D3  1_D5  1_D7  1_E1  1_E3  1_E5  1_E7  1_F1  1_F3
1_A4  1_A6  1_A8  1_B2  1_B4  1_B6  1_B8  1_C2  1_C4  1_C6  1_C8  1_D2  1_D4  1_D6  1_D8  1_E2  1_E4  1_E6  1_E8  1_F2  1_F4
$ ls IMC_oSCC_Angela2024/TMA01_OS/1_A3 | head
104Pd_104Pd.ome.tiff
113In_113In_HLA-ABC.ome.tiff
115In_115In_CD11c.ome.tiff
139La_139La_panCK.ome.tiff
141Pr_141Pr_CD20.ome.tiff
142Nd_142Nd_HH3.ome.tiff
143Nd_143Nd_CD45RA.ome.tiff
144Nd_144Nd_MPO.ome.tiff
145Nd_145Nd_CD103.ome.tiff
146Nd_146Nd_CD8a.ome.tiff

It is not clear from loadImages.Rd whether it does recursive search for file names matching the specified pattern or not.

Hmm, did you try executing the function on the TMA01_OS folder? The single channel functionality is quite experimental and not super well tested as there are not so many datasets that use this structure.

No, but I retried with x = "IMC_oSCC_Angela2024/TMA01_OS/" instead of x = "IMC_oSCC_Angela2024/" but is same error. If I then specify a folder of a single sample by x = "IMC_oSCC_Angela2024/TMA01_OS/1_A3/" a different error happens:

Error in .valid.loadImage.input(x, pattern, single_channel, name) : 
  The files are of type other than 'jpeg', 'tiff', 'png' or different file types are mixed.

The function is flummoxed by a single text file in the same folder as the various TIFF files.

1_A3 $ ls
104Pd_104Pd.ome.tiff                 153Eu_153Eu_CD68.ome.tiff      170Er_170Er_CD3.ome.tiff
113In_113In_HLA-ABC.ome.tiff         154Sm_154Sm_CD45.ome.tiff      171Yb_171Yb_GranzymeB.ome.tiff
115In_115In_CD11c.ome.tiff           155Gd_155Gd_CD31.ome.tiff      172Yb_172Yb_CD206.ome.tiff
139La_139La_panCK.ome.tiff           156Gd_156Gd_CXCR3.ome.tiff     173Yb_173Yb_CD4.ome.tiff
141Pr_141Pr_CD20.ome.tiff            158Gd_158Gd_Tbet.ome.tiff      174Yb_174Yb_HLADR.ome.tiff
142Nd_142Nd_HH3.ome.tiff             159Tb_159Tb_CD197.ome.tiff     175Lu_175Lu_ICOS.ome.tiff
143Nd_143Nd_CD45RA.ome.tiff          160Gd_160Gd_CD14.ome.tiff      176Yb_176Yb_CD56.ome.tiff
144Nd_144Nd_MPO.ome.tiff             161Dy_161Dy_FX111A.ome.tiff    189Os_189Os.ome.tiff
145Nd_145Nd_CD103.ome.tiff           162Dy_162Dy_FoxP3.ome.tiff     190Os_190Os.ome.tiff
146Nd_146Nd_CD8a.ome.tiff            163Dy_163Dy_PD1.ome.tiff       191Ir_191Ir_DNA1.ome.tiff
147Sm_147Sm_podoplanin.ome.tiff      164Dy_164Dy_anti-Cy5.ome.tiff  193Ir_193Ir_DNA2.ome.tiff
148Nd_148Nd_CD16.ome.tiff            165Ho_165Ho_OX40.ome.tiff      208Pb_208Pb.ome.tiff
149Sm_149Sm_CADM1.ome.tiff           166Er_166Er_CD44.ome.tiff      209Bi_209Bi_DC-SIGN.ome.tiff
150Nd_150Nd_IDO.ome.tiff             167Er_167Er_CD66a.ome.tiff     80ArAr_80ArAr.ome.tiff
151Eu_151Eu_PDL1.ome.tiff            168Er_168Er_Ki67.ome.tiff      89Y_89Y_aSMA.ome.tiff
152Sm_152Sm_anti-Cy3-TIGIT.ome.tiff  169Tm_169Tm_Lag3.ome.tiff      ROI002_ROI_002_A3_SP20-003549_summary.txt

If I specify "tiff" as the pattern, why is it even considering ROI002_ROI_002_A3_SP20-003549_summary.txt?

1_A3 $ head ROI002_ROI_002_A3_SP20-003549_summary.txt 
Page    Channel Label   MinValue        MaxValue
0       ArAr(80)        80ArAr  0.00    8815.00
1       Y(89)   89Y_aSMA        0.00    42.00
2       Pd(104) 104Pd   0.00    8.00
3       In(113) 113In_HLA-ABC   0.00    50.00
4       In(115) 115In_CD11c     0.00    38.00
5       La(139) 139La_panCK     0.00    96.00
6       Pr(141) 141Pr_CD20      0.00    22.00
7       Nd(142) 142Nd_HH3       0.00    28.00
8       Nd(143) 143Nd_CD45RA    0.00    24.00

All imaging mass cytometry data sets generated in our university seem to have this structure returned to the customer.

I think when your team supplied example data in the past there was no TXT file included. The function assumes that only single channel files of the same type are included per folder. The pattern is only in place to select individual images, not channels. So could you try it again after excluding the TXT file from the individual folders and not supplying a pattern argument? To fix this the function would need to take a second pattern argument to select single channel files next to image folders. Or being stricter on which file types are supported to being read in.

Ah, O.K. Excluding the text file from each folder avoids the error. However, I subsequently see another error.

> loadImages("IMC_oSCC_Angela2024/TMA01_OS/1_A3/", single_channel = TRUE, on_disk = TRUE, h5FilesPath = "/tmp/")
Error in .valid.loadImage.input(x, pattern, single_channel, name) : 
  Setting 'single_channel' requires 'x' to be a single path.

It should be fine according to my reading. Previously, I showed that 1_A3 folder contains a set of TIFF files.

For this, x needs to be a single path either containing individual files or sub-paths containing those.

Is the path accesible from your working directory? Maybe provide the full path rather than the relative path.

Oh, how strange. It works with an absolute path but not a relative path.

CytoImageList containing 1 image(s)
names(1): 1_A3 
Each image contains 47 channel(s)