BombCell: find bombshell cells!

Bombcell is a powerful toolbox designed to evaluate the quality of recorded units and extract essential electrophysiological properties. It is specifically tailored for units recorded with Neuropixel probes (3A, 1.0, and 2.0) using SpikeGLX or OpenEphys and spike-sorted with Kilosort.

Please note that Bombcell is currently unpublished. If you find Bombcell useful in your work, we kindly request that you acknowledge its contribution by citing xxx.

Preliminary versions of Bombcell have been used in the following studies:

Peters et al., Nature, 2021. This study employed Bombcell to define real, well-isolated striatal units and classify them into MSNs, FSIs, and TANs. You can refer to the script bc_selectAndClassifyStriatum (work in progress) to classify striatal cells in a similar manner.
Peters et al., Cell Reports, 2022. Bombcell was used in this study to define real, well-isolated pre-frontal cortex units.

Table of contents:

TL;DR
Installation
Dependancies
Quality metrics guide
Quality metrics GUI guide
Ephys properties guide
Recommended pre-processing
Troubleshooting: common errors
Other ephys utilities
Contact us

TL;DR (Too long; didn't read)

Bombcell extracts relevant quality metrics to categorize units into three categories: single somatic units, multi-units, noise units and non-somatic units.

The script bc_qualityMetrics_pipeline provides an example workflow to get started.

Installation

To begin using Bombcell, clone the repository and the dependancies.

If you want to compute ephys properties, change your working directory to bombcell\ephysProperties\helpers in matlab and run mex -O CCGHeart.c to able to compute fast ACGs.

If you have z-lib compressed ephys data, compressed with mtscomp, you will additionally need the zmat toolbox.

Dependancies

https://github.com/kwikteam/npy-matlab (to load .npy data in)

Quality metrics guide

General overview

Bombcell has been written to be as user-friendly as possible. If you run into any issues or have suggestions, please raise a github issue or email us.

In order to avoid possible conflicts, all Bombcell functions are preceded by bc_.

Qmetric and param structures

Bombcell is organized around two main matlab structures: qMetrics and param.

qMetrics is a structure generated by the bc_runAllQualityMetrics function. Its fields are different quality metrics, and each field contains a quality metric value for each unit.
param is a structure generated by the function bc_qualityParamValues and it contains parameters both to compute quality metrics with bc_runAllQualityMetrics and to then classify cells with bc_getQualityUnitType.

A run-through of a typical workflow

Load ephys data, eg:

[spikeTimes_samples, spikeTemplates, templateWaveforms, templateAmplitudes, pcFeatures, pcFeatureIdx, channelPositions] = bc_loadEphysData(ephysKilosortPath)
Run all quality metrics and clasify cells with the function bc_runAllQualityMetrics, eg:

[qMetric, unitType] = bc_runAllQualityMetrics(param, spikeTimes_samples, spikeTemplates, templateWaveforms, templateAmplitudes, pcFeatures, pcFeatureIdx, channelPositions, savePath)
Look at global output plots and flip through cells with the GUI:

unitQualityGuiHandle = bc_unitQualityGUI(memMapData, ephysData, qMetric, forGUI, rawWaveforms, param, probeLocation, unitType, loadRawTraces)
Refine classification : choose quality metric thresholds (see the section Setting appropriate quality metric paramaters and thresholds below).

Plotting quality metrics

We have added automatic plotting function at each step, in order to aid in trouble-shooting and evaluating Bombcells' output:

Set param.plotGlobal = 1 to plot summary of the noise units' waveforms compared to single multi-units, distribution histograms of the units' quality metrics and numbers of units removed by each quality metric. See the detailed overview of global output plots section for examples of these plots.
Set param.plotThis = 1 to plot figures for each quality metric and for each cell (plot examples displayed below). This generates a considerable amount of plots. We suggest you only use when de-bugging or to generate example cell plots. See the detailed overview of the quality metrics section for examples of these plots.

In addition, Bombcells' GUI allows you to flip through cells and their quality metrics. This is useful in evaluating where to set your quality metric classification thresholds and checking how Bombcell behaves. See the quality metrics GUI guide section for more information.

Loading and saving data

All Bombcell outputs are saved in the ONE format, in .npy or .parquet files. .npy files can be read in matlab using the npy-matlab toolbox, and .parquet files can be read in matlab using the built-in readparquet function.

The following files are outputed by Bombcell:

_bc_parameters._bc_qMetrics.parquet
templates._bc_fractionRefractoryPeriodViolationsPerTauR.parquet
templates._bc_qMetrics.parquet
templates._bc_rawWaveforms.npy
templates._bc_rawWaveformPeakChannels.npy
templates._bc_baselineNoiseAmplitude.npy
templates._bc_baselineNoiseAmplitudeIndex.npy
(optional, if param.saveMultipleRaw is set to true) 'Unit$CLUSTER_ID$_RawSpikes.npy'

These saved files can be retrieved with the function [param, qMetric] = bc_loadSavedMetrics(savePath)

Additionally, optionally if param.saveMatFileForGUI is set to true, Bombcell saves a templates.qualityMetricDetailsforGUI.mat file containing various data needed for Bombcells GUI. This data can also be loaded using the function bc_loadMetricsForGUI.

Setting appropriate quality metric paramaters and thresholds

We suggest first computing quality metrics with the default paramater values (defined by the param structure in bc_qualityParamValues). You can fine-tune the thresholds based on your specific neuronal region and requirements by looking at the following:

individual units, in the interactive GUI. Do the current quality metric threshold classify units into categories in a reasonable manner? (ie do "good"-classified units look good ?)
the distribution histograms of the units' quality metrics (see General plots). Ther can be natural places to set thresholds id there are for instance bimodal distributions.
the numbers of units removed by each quality metric.
It may also be helpful to plot the quantity you which to measure as a function of each quality metric (see Fig. 2 Harris et al., Nat. Neuro, 2016).

Running speed and bottlenecks

Depending on your settings, Bombcell takes between 2' and 35' for a typical 1 hour long neuropixels recording. The running time varies according to three main parameters:

if this is the first run, raw waveforms are extracted from your raw .bin or .dat ephys file. These are then saved in two .npy files. Subsequent runs load in the already extracted waveforms and thus take ~10-15' less. Initial running time increases or decreases as the number of spikes extracted per unit, defined by param.nRawSpikesToExtract, increases or decreases.
if param.computeDrift is true, drift is computed. This can add ~10' per run, and is disabled by default.
if param.computeDistanceMetrics is true, several distance metrics are computed. This can add ~10' per run. Distance metrics are susceptible to probe drift, and correlate well with a units raw waveform amplitude. As a result, we suggest using a units amplitude instead of distance metrics and this parameter is diabled by default.

Detailed overview of the quality metrics

Quality metrics to classify noise/non-somatic units

Somatic waveforms are defined as waveforms where the largest trough precedes the largest peak (Deligkaris, Bullmann & Frey, 2016).
Noise units are classified as noise if any of the following are true
- the number of peaks or troughs detected in the max channel template waveform for the unit exceeds the defined thresholds : param.maxNPeaks and param.maxNTroughs.
- the baseline of the max channel template waveform for the unit is not flat (exceeds the defined threshold param.maxWvBaselineFraction).
- the slope template waveform spatial decay is above a defined threshold param.minSpatialDecaySlope.

Quality metrics to classify multi-units

After classifying noise units, the remaining units are classifyed as good single units if the criteria below are met. They are classifying as multi-units otherwise.

the estimated percentage of spikes missing is below the param.maxPercSpikesMissing

The percent of spikes missing (false negatives) is estimated by fitting a gaussian the distribution of amplitudes, with a cutoff parameter. This assumes the spike amplitudes follow a gaussian distribution, which is not strictly true for bursty cells, like MSNs. Optionally, if param.computeTimeChunks is true, the recording is split in time chunks of size param.deltaTimeChunk, and the percentage of spikes missing is estimated seperately on these time chunks. Then, only the time chunks with a percent of spikes missing below the threshold are kept, and the rest of the quality metrics are computed on these epochs. This can be of use in cases where there is a lot of drift during the recording.

Below: example of unit with many spikes below the detection threshold in the first two time chunks of the recording.
number of spikes is above param.minSpikes

Below a certain amount of spikes, ephys properties like ACGs will not be reliable. A good minimum to use is 300 empirically, because Kilosort2 does not attempt to split any clusters that have less than 300 spikes in the post-processing phase.
the estimated percent of refractory period violations is below param.maxRPVviolations

The fraction of refractory period violations (false positives) is estimated using r = 2*(tauR - tauC) * N^2 * (1-Fp) * Fp / T , solving for Fp, with tauR the refractory period, tauC the censored period, T the total experiment duration, r the number of refractory period violations, Fp the fraction of contamination. method from Hill et al., J. Neuro, 2011.

Below: examples of a unit with a small fraction of refractory period violations (left) and one with a large fraction (right).
the raw mean waveform maximum amplitude is above param.minAmplitude

Units with low amplitude are noisy, further away units, that are more likely to be MUA.

Below: examples of a unit with high amplitude (blue) and one with low amplitude (red).
distance metrics are above param.minIsoD for the isolation distance, param.lratioMin for l-ratio, param.ssMin for the silhouette score

Disabled by default. This is if the most time-consuming part of the script. See Harris et al., Neuron, 2001 for more information on isolation distance, (see Schmitzer-Torbert and Redish, 2004) for more information on l-ratio and (see Rousseeuw, 1987 for more information on silhouette-score.

Below: examples of a unit with high isolation distance (left) and one with low isolation distance (right).

Detailed overview of global output plots

If param.plotGlobal is set to true, after computing quality metrics, the script will output 3 summary plots :

a Euler diagram of the number of units classifyed as noise/multi-unit with each quality metric. Numbers in the circles indicate the number of units classifying as noise/multi-unit by that quality metric/intersection of quality metrics.
for each a quality metric, a histogram of the distribution of values for all units. The red lines indicate classification thresholds.
template waveforms for each unit classiyed as good, multi-unit and noise, overlaid on top of each other. This is a quick way of checking there is no odd, noisy unit that has mistakenly been classifyed as either good or multi-unit.

Quality metrics GUI guide

Plot a GUI to flip through the quality metrics for each cell with the function bc_unitQualityGUI Eg:

bc_unitQualityGUI(memMapData, ephysData, qMetrics, param, probeLocation, unitType, plotRaw)

Toggle units by using the → and ← keys. Pressing g, m, n brings you to the next good, multi-unit or noise unit, respectively. Press u to select a particular unit. Navigate in time through the raw data to see this units' individual spikes with the ↑ and ↓ keys.

Unit location view

This view plots the depth of each unit on the probe in y, and it's log-normalized firing rate in x. Single units are plotted in green, multi-units in indigo and noise in red. The current unit is plotted larger and circled in black.
Template waveform view

This view plots the template waveforms for the current unit. The maximum waveform is in blue, and detected peaks are overlaid.
Raw waveform view

This view plots the mean raw waveforms for the current unit. The maximum waveform is in blue, and detected peaks are overlaid.
ACG view

This view plots the auto-correlogram (ACG) for the current unit. The horizontal red line indicates the ACG asymptote, which corresponds to the unit's firing rate. The vertical red line plot the refractory period location.
ISI view

This view plots the inter-spike-intervals (ISI) for the current unit. The vertical red line plot the refractory period location.
Isolation distance view
Raw waveform view

Plots the raw data in black, with detected spikes for this unit in blue.
Amplitude view

This view plots the scaling factor applied to each spike by kilosort in black. Spikes currently displayed in the raw data view are shown in blue, and spikes that have an ISI < refractory period threshold are shown in purple.

Ephys properties guide

work in progress

Classifying cell types

striatal cells
GPe cells
cortical cells

Recommended recording and pre-processing

To reduce the noise units you obtain, we recommend recording with either both a ground wire and reference wire going from the probe to the mouse, or with a ground wire and the internal reference (if using probes other than the 3A - the internal reference on these doesn't work well). To remove artefacts from your recorded data, either temporally align your channels with each other and common-average reference your data with Bill Karsh's function CatGT, before feeding this data into kilosort or use pyKilosort, where this is implemented.

To maximize your number of single good units, we recommend looking at your raw data and spikes detected by kilosort, to assess whether most spikes are being detected. If not, consider lowering kilosort's detection thresholds.

Troubleshooting: common errors

Help, my extracted raw waveforms look very flat and odd / I get errors on the raw waveform extraction step.

Check that the param.nChannels value is correctly set. This number must correspond to the total number of channels in your raw data, including any sync channels. For neuropixels probes, this value should typically be either 384 or 385 channels.

Why do I have less units in Bombcell's outputs than in kilosort?

You have the same number of units (try to do length(qMetric) and length(unique(spike_templates)) to check), but they are stored in the Bombcell structure slightly differently. Because kilosort drops units at the last stages of algorithm but does not re-label them, you often end up with a larger maximum template label than there are templates. To store things more efficiently, Bombcell re-indexes all the units, giving them a label from 1 to the number of unique templates. But fear not, you can still get Kilosort's original label with qMetric.clusterID - 1.

Why does qMetric.clusterID not correspond to kilosort's labels?

Kilosort's labels are 0-indexed (ie starts at 0). This isn't very practical in matlab, so we add a + 1 to 1-index them. To find the kilosort label, simply do qMetric.clusterID - 1.

Something else

Please raise a github issue or alternatively email us.

Various other utilities:

generate IMRO files/channel maps : see recordingUtilities
manual curation guide
quality metric plugins for phy : see phyPlugins
decompress data z-lib compressed with mtscomp : see decompressData. Requires the zmat toolbox

Contact us

If you run into any issues or if you have any suggestions, please raise a github issue or alternatively email us.

m-beau/bombcell