flatironinstitute/mountainsort5

Memory errors constantly kill my job

Opened this issue · 12 comments

I'm trying to sort a neuropixel array recording with mountainsort 5, and it keeps getting killed. The recording is ~3 hours long. I am using the following parameters. The machine in question has 128 GB of memory. Any suggestions? I have already brought the radii down, the classification chunk sec lower, is there anything I can do to reduce things further? Perhaps reduce block duration even lower?

sorting_params = {}
sorting_params["max_num_snippets_per_training_batch"] = 1000
sorting_params["snippet_mask_radius"] = 60
sorting_params["phase1_npca_per_channel"] = 3
sorting_params["phase1_npca_per_subdivision"] = 10
sorting_params["classifier_npca"] = 10
sorting_params["detect_channel_radius"] = 60
sorting_params["phase1_detect_channel_radius"] = 60
sorting_params["training_recording_sampling_mode"] = "uniform"
sorting_params["training_duration_sec"] = 60
sorting_params["phase1_detect_threshold"] = 5.5
sorting_params["detect_threshold"] = 5.25
sorting_params["snippet_T1"] = 15
sorting_params["snippet_T2"] = 40
sorting_params["detect_sign"] = 0
sorting_params["phase1_detect_time_radius_msec"] = 0.5
sorting_params["detect_time_radius_msec"] = 0.5
sorting_params["classification_chunk_sec"] = 100

sorting = ms5.sorting_scheme3(
recording=ret,
sorting_parameters=ms5.Scheme3SortingParameters(
block_sorting_parameters=ms5.Scheme2SortingParameters(**sorting_params),
block_duration_sec=60 * 5,
),

The error message is below.

Detected 2594233 spikes
*** MS5 Elapsed time for detect_spikes: 7659.681 seconds ***
Removing duplicate times
*** MS5 Elapsed time for remove_duplicate_times: 0.032 seconds ***
Extracting 1262530 snippets
*** MS5 Elapsed time for extract_snippets: 69.843 seconds ***
Computing PCA features with npca=1152
Killed

Try raising the detect thresholds to 7, and dropping the number of snippets to 500.

3 hours is quite long, but at this stage, the memory issues are coming from the fact that you are detecting a huge number of spikes at the classifier training stage.

I have been playing with using scheme 3 for super long recordings as a way to get around memory limits, but as of now, my best advice is to tweak parameters until you can fit it all.

Are you using 1.0 or 2.0 neuropixels?

@rtraghavan

A few other thoughts:

  • If there are channels you know you don't need (for example, if they are outside of the brain), you can remove them from the recording object. This is a quick and easy way to fit things onto memory
  • On 1.5 hr recordings, I usually do a quick first-pass sort on a 128GB machine with the following settings just to evaluate some first-order metrics. These are usually sufficient to pull out clear units. Then, I'll do a more compute-intensive sort on a server with more RAM. Though, as I understand it, scheme_3 should theoretically provide a means of tackling arbitrary-length recordings for less compute than scheme_2 - though curious if @magland agrees...
        sorting_params = {}                                                         
                                                                                    
        sorting_params["max_num_snippets_per_training_batch"] = 500                 
        sorting_params["snippet_mask_radius"] = 30                                  
        sorting_params["phase1_npca_per_channel"] = 3                               
        sorting_params["phase1_npca_per_subdivision"] = 3                           
        sorting_params["classifier_npca"] = 3                                       
        sorting_params["detect_channel_radius"] = 30                                
        sorting_params["phase1_detect_channel_radius"] = 30                         
        sorting_params["training_recording_sampling_mode"] = "uniform"              
        sorting_params["training_duration_sec"] = 150                               
        sorting_params["phase1_detect_threshold"] = 6.5                             
        sorting_params["detect_threshold"] = 6.0                                                                                                                                              
        sorting_params["snippet_T1"] = 15                                           
        sorting_params["snippet_T2"] = 35                                           
        sorting_params["detect_sign"] = -1                                          
        sorting_params["phase1_detect_time_radius_msec"] = 0.5                      
        sorting_params["detect_time_radius_msec"] = 0.5                             
        sorting_params["classification_chunk_sec"] = 100                            

Reducing the training batch size and raising detect thresholds did not work. I'm repeating the process with the parameters you suggested and even smaller batch sizes (1 minute). Though that point, I think it may make more sense to sort with scheme 2 for 30-minute chunks with 15-minute overlap between chunks and then look for overlap, which I did before this latest release.

These are neuropixels 1.0 probes. I'm curious to know what @magland thinks.

Let me know if you want to jump on a zoom and talk through some points that we could improve for neuropixel-specific use cases.

Email is melander at stanford dot edu

Luckily, given the changes you suggested, things have improved substantially. I'm waiting for the sort to complete fully to inspect the actual identified units. I'll post the results later.

Good to hear! One question - when you mention batch size, do you mean classification_chunk_sec? Or are you referring to something using scheme 3.

Sorry for being late to this thread. Thanks @jbmelander for your guidance. It all looks good to me.

Regarding scheme 3, I have only tested it with one dataset thus far. My hope is that it could be improved over time so that it could be useful for 24+ hour recordings.

@jbmelander I agree with you that it can be helpful to restrict to a subset of good channels when that makes sense.

@magland @rtraghavan I think there are a few fixes that would make scheme 3 pretty powerful. As it is now, it seems like a great idea and well implemented, but I do get results that somewhat overestimates the number of clusters. If anyone is interested in making a plan for developing scheme 3, I'm in.

If anyone is interested in making a plan for developing scheme 3, I'm in.

Sure, do you want to start a new gh issue thread?

Luckily, given the changes you suggested, things have improved substantially. I'm waiting for the sort to complete fully to inspect the actual identified units. I'll post the results later.

@jbmelander @rtraghavan

Looking back at this thread, I'm curious if some of the details as to which parameters were changed may have been in a zoom call. If so, it would be helpful if you could include some of that info here. Thanks.

We haven't met yet - but I will make a new issue once we have.