Sorting analyzer behavior with multi probe recording

Hi,
I have a problem with the following pipeline:
I have a 2 NPX1 recording.
I sort each probe individually.
I curate each one in phy.
I load each sorting with read_phy and aggregate them with aggregate_units.
I compute extensions on this aggregated sorting.

The waveforms from this aggregated sorting analyzer are, however, off. If I plot them using plot_units_waveforms, I see four columns of waveforms, two columns are fine, two columns are flat lines (I'll post a pic later). Notably, if I plot using a sorting_analyzer created from an individual sorting (not the aggregated one) waveforms are fine. I guess the waveform extraction is not noticing the probe group of each channel and picking the channels corresponding to the peak template in both probes? Not sure though.

Should I use the sorting analyzer on not aggregated sorting objects only?if yes shouldn't there be some correspondent method to aggregate_units for the sorting analyzers too?

Thanks!

I think the picture would help. I also sort with probe groups and I haven't noticed this problem before. The sparsity should take care of this I believe. Are you creating your analyzer with sparsity = False to get dense info?

They look all like this:

this is the code I use, I don´t explicitly set sparsity:

sortings = []
for probe_n in tqdm(range(len(raw_rec.get_probes()))):
            sortings.append(read_phy(Path(f"{path_recording_folder}/spike_interface_output/probe{probe_n}/sorter_output")))

sorting = aggregate_units(sortings)

analyzer = create_sorting_analyzer(sorting, raw_rec, sparse=True, format="binary_folder",
                                           folder=f"{path_recording_folder}/spike_interface_output/sorting_analyzer_post_curation",
                                           overwrite=True, **job_kwargs)

How are you sorting to start with? What is the script for sorting to get to the phy data step? Are you doing run_sorter_by_property or something else?

I think I know what's going on! How are you constructing the probe group? The two probes should be separated by a certain distance! The default sparsity looks for channels in the vicinity, so if the two probes are almost overlapping, then half of the channels will be noisy and the templates will be flat. Can you print the recording channel locations and the channel groups?

I sort with this:

path_recording, rec_file_name = find_rec_file(path_recording_folder)
timestamps = get_timestamps_from_rec(path_recording_folder)

raw_rec = read_spikegadgets(path_recording)
raw_rec.set_times(timestamps) # this is useful for other parts of the code

split_preprocessed_recording = raw_rec.split_by("group")
 for probe_n, sub_rec in split_preprocessed_recording.items():
     binary_file_path = f"{path_recording_folder}/spike_interface_output/probe{probe_n}/sorter_output/recording.dat"
     probe_filename = f"{path_recording_folder}/spike_interface_output/probe{probe_n}/sorter_output/probe.prb"
     if not os.path.exists(binary_file_path):
         os.makedirs(Path(binary_file_path).parent, exist_ok=True)
         write_binary_recording(
             recording=sub_rec,
             file_paths=binary_file_path, **job_kwargs)

         pg = sub_rec.get_probegroup()
         write_prb(probe_filename, pg)
         print(f"probe {probe_n} completed \n")

     probe = load_probe(probe_filename)
     settings = {'filename': binary_file_path,
                 "n_chan_bin": probe["n_chan"], "fs": sub_rec.get_sampling_frequency()}#try nskip 20

     result_dir = path_recording_folder / "spike_interface_output" / f"probe{probe_n}" / "sorter_output"
     os.makedirs(result_dir, exist_ok=True)

     run_kilosort(settings=settings, probe=probe, data_dtype=sub_rec.get_dtype(),
                  device=torch.device("cuda"), results_dir=result_dir, clear_cache=True)

I want to eventually move everything within SpikeInterface but I got some cuda OOM errors and it was easy to investigate by using directly kilosort

Actually there is some distance on the x axis (0 column in df) and the x coords are rightly assigned between groups. Channels belonging to group 0 have the lower set of x coords (-24, -8, 8, 24).
Btw, Channel ids are strings in this format: '735' '734' '671' '670' '607' '606' '543' '542' '479' '478'....

Regarding sparsity:

But what Alessio is saying is that the sparsity default is 50-100 um (I can't remember which on the top of my head). So since you have 4 electrodes sets all within that distance it is including all of them in the range even if a chunk of them are noisy. Alessio knows NP better than I do. How many shanks do you have total? (he can comment better when he's back on) but just to get some more info.

I have 2 NPX1 per recording. Each NPX will have 4 different xcoords.
There aren´t noisy channels really. If I create the sort_analyzer from only one sorting (not the aggregated sorting), all the waveforms look fine. The problem happens only when creating the sorting analyzer from the aggregated sorting somehow. The confusing thing is that the channels location looks fine to me and definitely more distant than the default radius

spikeinterface/src/spikeinterface/core/sparsity.py

Lines 541 to 552 in d5f1481

    
           def estimate_sparsity( 
        
               sorting: BaseSorting, 
        
               recording: BaseRecording, 
        
               num_spikes_for_sparsity: int = 100, 
        
               ms_before: float = 1.0, 
        
               ms_after: float = 2.5, 
        
               method: "radius" | "best_channels" = "radius", 
        
               peak_sign: str = "neg", 
        
               radius_um: float = 100.0, 
        
               num_channels: int = 5, 
        
               **job_kwargs, 
        
           ):

The behavior would be explained if waveforms are selected also from channels belonging to the other probe that expetedly would just be noisy.

Ok channel selected from here are messed up. I will try to understand the precise reason later or tmrrw.

spikeinterface/src/spikeinterface/core/sparsity.py

Lines 299 to 316 in d5f1481

    
               def from_radius(cls, templates_or_sorting_analyzer, radius_um, peak_sign="neg"): 
        
                   """ 
        
                   Construct sparsity from a radius around the best channel. 
        
                   Use the "radius_um" argument to specify the radius in um 
        
                   """ 
        
                   from .template_tools import get_template_extremum_channel 
        
                   mask = np.zeros( 
        
                       (templates_or_sorting_analyzer.unit_ids.size, templates_or_sorting_analyzer.channel_ids.size), dtype="bool" 
        
                   ) 
        
                   channel_locations = templates_or_sorting_analyzer.get_channel_locations() 
        
                   distances = np.linalg.norm(channel_locations[:, np.newaxis] - channel_locations[np.newaxis, :], axis=2) 
        
                   best_chan = get_template_extremum_channel(templates_or_sorting_analyzer, peak_sign=peak_sign, outputs="index") 
        
                   for unit_ind, unit_id in enumerate(templates_or_sorting_analyzer.unit_ids): 
        
                       chan_ind = best_chan[unit_id] 
        
                       (chan_inds,) = np.nonzero(distances[chan_ind, :] <= radius_um) 
        
                       mask[unit_ind, chan_inds] = True 
        
                   return cls(mask, templates_or_sorting_analyzer.unit_ids, templates_or_sorting_analyzer.channel_ids)

Somehow some channels that belong to group1 have x coord corresponding to group0. I guess that´s actually a Neo import bug.

Moved to NeuralEnsemble/python-neo#1548

I think the problem is in the read_spikegadgets in probeinterface actually!

I am reopening this because I'm not sure any more it is a neo or probeinterface problem

test rec file here: https://www.dropbox.com/s/bsjd1ipx430dr00/20240124_171519.rec?dl=0

In this case the xcoord of group 0 will contain array([ 8. -24. 258. 226.])

raw_rec = read_spikegadgets(Path(r"X:\data\12\ephys\20240124_171519.rec\20240124_171519.rec"), use_names_as_ids=False )

channel_locations = raw_rec.get_channel_locations()
locations_group_df = pd.concat([pd.DataFrame(channel_locations, index=raw_rec.channel_ids, columns=["x", "y"]), pd.DataFrame(raw_rec.get_channel_groups(), index=raw_rec.channel_ids, columns=["group"])], axis=1)
print(locations_group_df.query("group == 0")["x"].unique())
[  8. -24. 258. 226.]

Here the xcoord of group 0 will contain array([ 8., -24., 24., -8.])

sub_rec = raw_rec.split_by("group")[0]

channel_locations = sub_rec.get_channel_locations()
locations_group_df = pd.concat([pd.DataFrame(channel_locations, index=sub_rec.channel_ids, columns=["x", "y"]), pd.DataFrame(sub_rec.get_channel_groups(), index=sub_rec.channel_ids, columns=["group"])], axis=1)
locations_group_df.query("group == 0")["x"].unique()
array([  8., -24.,  24.,  -8.])

I can't understand what's going on here.....

The problem is solved if I set raw_rec.set_property("contact_vector", None). Doing so I directly access locations here:

spikeinterface/src/spikeinterface/core/baserecordingsnippets.py

Lines 347 to 367 in 358e0d3

    
           def get_channel_locations(self, channel_ids=None, axes: str = "xy"): 
        
               if channel_ids is None: 
        
                   channel_ids = self.get_channel_ids() 
        
               channel_indices = self.ids_to_indices(channel_ids) 
        
               if self.get_property("contact_vector") is not None: 
        
                   if len(self.get_probes()) == 1: 
        
                       probe = self.get_probe() 
        
                       positions = probe.contact_positions[channel_indices] 
        
                   else: 
        
                       all_probes = self.get_probes() 
        
                       # check that multiple probes are non-overlapping 
        
                       check_probe_do_not_overlap(all_probes) 
        
                       all_positions = np.vstack([probe.contact_positions for probe in all_probes]) 
        
                       positions = all_positions[channel_indices] 
        
                   return select_axes(positions, axes) 
        
               else: 
        
                   locations = self.get_property("location") 
        
                   if locations is None: 
        
                       raise Exception("There are no channel locations") 
        
                   locations = np.asarray(locations)[channel_indices] 
        
                   return select_axes(locations, axes)

By solved I mean that plot_unit_waveforms shows sensible results

@alejoe91 will be better to comment this deep. That seems like a bug to me (ie we shouldn't force someone to set contact_vector to None for this type of operation, but I'll wait for him to look this over.

Sure, totally agree, I am looking further into it.

Oh It actually doesnt solve the original problem.

This test here produces sensible results #3383 (comment) but plot_unit_waveforms is off

ok I updated the function get_channel_location also in sorting analyzer

	def estimate_sparsity(
	sorting: BaseSorting,
	recording: BaseRecording,
	num_spikes_for_sparsity: int = 100,
	ms_before: float = 1.0,
	ms_after: float = 2.5,
	method: "radius" \| "best_channels" = "radius",
	peak_sign: str = "neg",
	radius_um: float = 100.0,
	num_channels: int = 5,
	**job_kwargs,
	):

	def from_radius(cls, templates_or_sorting_analyzer, radius_um, peak_sign="neg"):
	"""
	Construct sparsity from a radius around the best channel.
	Use the "radius_um" argument to specify the radius in um
	"""
	from .template_tools import get_template_extremum_channel

	mask = np.zeros(
	(templates_or_sorting_analyzer.unit_ids.size, templates_or_sorting_analyzer.channel_ids.size), dtype="bool"
	)
	channel_locations = templates_or_sorting_analyzer.get_channel_locations()
	distances = np.linalg.norm(channel_locations[:, np.newaxis] - channel_locations[np.newaxis, :], axis=2)
	best_chan = get_template_extremum_channel(templates_or_sorting_analyzer, peak_sign=peak_sign, outputs="index")
	for unit_ind, unit_id in enumerate(templates_or_sorting_analyzer.unit_ids):
	chan_ind = best_chan[unit_id]
	(chan_inds,) = np.nonzero(distances[chan_ind, :] <= radius_um)
	mask[unit_ind, chan_inds] = True
	return cls(mask, templates_or_sorting_analyzer.unit_ids, templates_or_sorting_analyzer.channel_ids)

	def get_channel_locations(self, channel_ids=None, axes: str = "xy"):
	if channel_ids is None:
	channel_ids = self.get_channel_ids()
	channel_indices = self.ids_to_indices(channel_ids)
	if self.get_property("contact_vector") is not None:
	if len(self.get_probes()) == 1:
	probe = self.get_probe()
	positions = probe.contact_positions[channel_indices]
	else:
	all_probes = self.get_probes()
	# check that multiple probes are non-overlapping
	check_probe_do_not_overlap(all_probes)
	all_positions = np.vstack([probe.contact_positions for probe in all_probes])
	positions = all_positions[channel_indices]
	return select_axes(positions, axes)
	else:
	locations = self.get_property("location")
	if locations is None:
	raise Exception("There are no channel locations")
	locations = np.asarray(locations)[channel_indices]
	return select_axes(locations, axes)