Set from visible broken in 130 channel image

Question

Set from visible broken in 130 channel image

Closed this issue 8 months ago · 8 comments

This is because there are 4 channel names "ROI-05A-DAPI", which are C1, C2, C3, C128, and unfortunately in some cases these are labelled eg "ROI-05A-DAPI" and in other cases are labelled as "ROI-05A-DAPI (C1)". I'm matching on the channel name which doesn't include "(C1)"

In the first instance I matched the code in ImageDataTransformerBuilder that uses getAvailableUniqueChannelNames, but this fails because then I'm assuming that C2 is the first non-DAPI channel.

The question is, does this mean that the pixel classifiers are broken in this way too...? (answer: yes)

Answer 1 · 2024-04-24T15:52:32.000Z

I'm struggling to decipher this, and consequently can't really understand the purpose of the fix at #13

Can you provide a minimal failing example to show that the pixel classifier is broken?

Answer 2 · 2024-04-24T16:05:29.000Z

I'll record something tomorrow demonstrating the issue in InstanSeg. Maybe the pixel classifier is behaving as intended, by ignoring any duplicate channel names, but I'd prefer a stronger response than logging a warning and going on regardless

Answer 3 · 2024-04-25T13:41:29.000Z

I don't think the pixel classifier is broken, I just thought the behaviour (drop a warning in the logs, then ignore any duplicated channel names and just take the first) was a bit worrisome. However I agree that using channel numbers is also not great. At most, I would propose making the warning from the pixel classifier a bit stronger (perhaps an error notification?).

My goal is to enable a "Set from visible" option in the channel selector for InstanSeg, which tells InstanSeg to use the currently-visible channels in the viewer for inference. I've found this feature very useful for experimentation, I figure it's likely to be useful for multiplexed images (eg drop some noisy channels), and it's likely to be useful for cases where lots of channels means lots of memory used.

My problem was transferring selections from QuPathGUI.getInstance().getViewer().getImageDisplay().selectedChannels() to a list of suitable ColorTransforms. I was originally using name-based selection (ie ColorTransforms.createChannelExtractor("A"), but obviously this was simply taking the first channel with a matching name, so in the case of an image with channels:

A (C1)
A (C2)
B (C3)
C (C4)

I would end up extracting the channels:

A (C1)
A (C1)
B (C3)
C (C4)

Which is of course orders of magnitude worse than just ignoring duplicated channel names. I have now switched to index-based channel extraction (ColorTransforms.createChannelExtractor(0)).
However, it does mean that the extension is using index-based channel extraction by default, which is probably not ideal, especially for (eg) analysing across images in a project, where channel names but not order may be preserved.

Although, much less of an issue with channel invariance.

Answer 4 · 2024-04-25T14:04:35.000Z

Hmmm... would it be sensible to use name-based extraction by default, and switch to index-based extraction (with a logged warning) if duplicates are identified?

Or do you think it'd be better to throw an exception if there are duplicates, which would allow the message the propagated up to an exception dialog?

Alternatively, I suppose (bigger change) we could store names and indexes both, and log a warning when they don't match (using a sensible default behaviour, to be defined). But then we should likely do that in QuPath more 'centrally' via ColorTransforms and we'd need to be cautious not to break backwards compatibility for existing JSON classifiers.

Bad idea: In the specific case of InstanSeg, channel ordering shouldn't matter - so if A appears twice you can infer that both variations should be included... but this only reduces the occurrence of trouble, and it's not too hard to think of cases where ambiguity remains. So I don't think it's worth complicating the code to implement this.

Answer 5 · 2024-04-25T14:19:01.000Z

would it be sensible to use name-based extraction by default, and switch to index-based extraction (with a logged warning) if duplicates are identified?

For this case I think that's a good solution. Generally this is not likely to be a common issue, anyways, and we can ensure that channel names are used when scripting anyhow.

More generally, I think maybe we should handle duplicates for interactive/GUI use, and error hard in a script? Because it's weird to see a list of channels in the b&c window that isn't reflected in other channel lists in the same software instance.

Answer 6 · 2024-04-25T15:08:26.000Z

Am counting this as resolved with the "prefer names, fallback to index" solution for now

Answer 7 · 2024-04-25T16:13:30.000Z

Makes sense. We could/should use the 'Context help' in the toolbar to flag duplicate channels, and possibly highlight the issue in the B/C dialog to try to make it more evident as possible via the GUI.

Answer 8 · 2024-04-25T16:19:26.000Z

Yeah agree, it's one of those things that might work practically but is only likely to lead to horrible unseen analysis errors