MouseLand/suite2p

BUG: not able to save data.bin file after registration through colab - but same thing works through GUI

julia-napoli opened this issue · 7 comments

Describe the issue:

When trying to run suite2p registration, roi detection, classification, etc. on my 2-photon imaging data through Google Colab, suite2p will run through registration and try to save the corresponding files into the folder on my Google Drive, but will not be able to proceed past registration due to it saying there is no directory of the 'data.bin' file which it is supposed to be creating. This is not an issue of having the permission to save or write the file in the drive as I have given it access and it is able to make the 'suite2p' and 'plane0' folders, but not this 'data.bin' file. This is an issue with all of my 2-photon files, not just one of them -- and furthermore, when running these same data files through the GUI locally on my device there is no issue with saving the output files. Please help !!! Thank you

Reproduce the code example:

# install/import outside packages
!pip uninstall -y opencv-python-headless
!pip install opencv-python-headless==4.8.1.78
!pip install suite2p --upgrade
!pip install pynrrd
!pip install paramiko

!pip install fastapi
!pip install kaleido
!pip install python-multipart
!pip install uvicorn
!pip uninstall -y numpy
!pip install numpy==1.23.1
!pip uninstall -y seaborn
!pip install seaborn==0.9.0

import os, requests
from pathlib import Path
import matplotlib.pyplot as plt
import numpy as np
import xml.etree.ElementTree as ET
from tifffile import imread
import scipy
import seaborn as sns
from sklearn.decomposition import PCA, FastICA
from sklearn.cluster import KMeans, AgglomerativeClustering
import suite2p
import nrrd
from suite2p.io import BinaryFile


# figure style settings for notebook
import matplotlib as mpl
mpl.rcParams.update({
    'axes.spines.left': True,
    'axes.spines.bottom': True,
    'axes.spines.top': False,
    'axes.spines.right': False,
    'legend.frameon': False,
    'figure.subplot.wspace': .01,
    'figure.subplot.hspace': .01,
    'figure.figsize': (18, 13),
    'ytick.major.left': True,
})
jet = mpl.cm.get_cmap('jet')
jet.set_bad(color='k')

%matplotlib inline
#import os
import csv
import h5py
#import numpy as np
import pandas as pd
import math
import time
import json
import matplotlib as mpl
#from matplotlib import pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import animation
#!pip install pynrrd
#import nrrd
#import scipy
from scipy import io
from scipy.fftpack import rfft, irfft, fftfreq
from operator import xor
from scipy import signal
import statsmodels
import seaborn as sns
from sklearn.decomposition import PCA, FastICA
from sklearn.cluster import KMeans, AgglomerativeClustering
import random
from random import randrange
#from random import shuffle
sns.set_style("white")

# mount google drive
from google.colab import drive
drive.mount('/content/drive')

from google.colab import auth
auth.authenticate_user()

ops = suite2p.default_ops()

ops['nchannels'] = 1
ops['do_bidiphase'] = True
ops['keepdims'] = False
ops['sparse_mode'] = False
ops['spatial_scale'] = 1
ops['snr_thresh'] = 1.2
ops['maxregshiftNR'] = 5
ops['high_pass'] = 50
ops['max_overlap'] = 0.7
ops['neuropil_extract'] = False
ops['inner_neuropil_radius'] = 1
ops['soma_crop'] = False
ops['batch_size'] = 100 # we will decrease the batch_size in case low RAM on computer
ops['threshold_scaling'] = 0.6 # we are increasing the threshold for finding ROIs to limit the number of non-cell ROIs found (sometimes useful in gcamp injections)
# lisanne used 0.9 for threshold_scaling

"""
#cellpose features
ops['anatomical_only'] = 1 # run cellpose to get masks on 1: max_proj / mean_img; 2: mean_img; 3: mean_img enhanced, 4: max_proj
ops['diameter'] = 5  # use diameter for cellpose, if 0 estimate diameter
ops['cellprob_threshold'] =  0.0 # cellprob_threshold for cellpose
ops['flow_threshold'] = 1.1 # flow_threshold for cellpose
ops['spatial_hp_cp'] = 25 # high-pass image spatially by a multiple of the diameter
ops['pretrained_model'] = 'nuclei' # path to pretrained model or model type string in Cellpose (can be user model)
"""
print(ops)

for exp in experiments[6:]:
  if os.path.isdir(file_path + exp +'/') == True:
    print(str(exp))

    # load metadata
    anat_file = metadata.loc[exp,'anat_file']
    live_file1 = metadata.loc[exp,'live_file1']
    live_file2 = metadata.loc[exp,'live_file2']
    live_file3 = metadata.loc[exp,'live_file3']
    behav_file1 = metadata.loc[exp,'behav_file1']
    behav_file2 = metadata.loc[exp,'behav_file2']

    # f is exp now
    for item in [live_file1, live_file2, live_file3]:
      if os.path.isdir(file_path + exp + '/' + item) == True:
        print(str(live_file1))

        path = file_path + exp + '/' + live_file1 + '/'

        print(path)

        tree = ET.parse(path+"Experiment.xml")
        root = tree.getroot()

        pix_size = float(root[12].attrib['pixelSizeUM'])
        frame_rate = (root[12].attrib['frameRate'])
        frame_avg = (root[12].attrib['averageNum'])
        z_steps = 1
        z_flyback = 0
        #z_steps = (root[8].attrib['steps'])
        #z_flyback = (root[8].attrib['zStreamFrames']) # use this if recording from multiple zplanes
        timepoints = (root[9].attrib['timepoints'])

        print("pixels are", pix_size, "um, and volumes of ", int(z_steps)-int(z_flyback), " steps (",int(z_flyback)," flyback) aquired at ", float(frame_rate)/(float(z_steps)*float(frame_avg)), " Hz, with ", timepoints, " timepoints" )

        #use xml data to set additional params
        ops['nplanes'] = int(z_steps) # if volumetric stack do -int(z_flyback)
        ops['ignore_flyback'] = []; #[int(z_flyback)-1] # use this if recording from multiple zplanes
        ops['fs'] = float(frame_rate)/(float(z_steps)*float(frame_avg)) # in lisanne's is just /float(z_steps)
        ops['tau'] = 3.0 #H2B-GC6s
        ops['diameter'] = int(7.0/pix_size)
        ops['save_path0'] = path
        ops['keep_movie_raw'] = True
        ops['delete_bin']= False
        ops['move_bin']= False

        print('cell diameter in pixels is ', ops['diameter'])

        #RUN ALL as pipeline
        db = {'data_path': [path]}
        print(db)

        output_ops = suite2p.run_s2p(ops=ops, db=db)

Error message:

240126f2
jn240126f2-live
/content/drive/MyDrive/Personal_notebooks/Julia_Notebook/Imaging/hunger_state/DATA/240126f2/jn240126f2-live/
pixels are 0.814 um, and volumes of  1  steps ( 0  flyback) aquired at  5.045333333333333  Hz, with  6000  timepoints
cell diameter in pixels is  8
{'data_path': ['/content/drive/MyDrive/Personal_notebooks/Julia_Notebook/Imaging/hunger_state/DATA/240126f2/jn240126f2-live/']}
{'data_path': ['/content/drive/MyDrive/Personal_notebooks/Julia_Notebook/Imaging/hunger_state/DATA/240126f2/jn240126f2-live/']}
tif
** Found 1 tifs - converting to binary **
400 frames of binary, time 28.99 sec.
800 frames of binary, time 44.09 sec.
1200 frames of binary, time 60.55 sec.
1600 frames of binary, time 80.95 sec.
2000 frames of binary, time 104.82 sec.
2400 frames of binary, time 130.44 sec.
2800 frames of binary, time 152.23 sec.
3200 frames of binary, time 170.34 sec.
3600 frames of binary, time 188.67 sec.
4000 frames of binary, time 207.47 sec.
4400 frames of binary, time 228.32 sec.
4800 frames of binary, time 244.64 sec.
5200 frames of binary, time 266.70 sec.
5600 frames of binary, time 285.24 sec.
6000 frames of binary, time 295.87 sec.
time 296.11 sec. Wrote 6000 frames per binary for 1 planes
>>>>>>>>>>>>>>>>>>>>> PLANE 0 <<<<<<<<<<<<<<<<<<<<<<
NOTE: not registered / registration forced with ops['do_registration']>1
      (no previous offsets to delete)
NOTE: Applying builtin classifier at /usr/local/lib/python3.10/dist-packages/suite2p/classifiers/classifier.npy
----------- REGISTRATION
NOTE: estimated bidiphase offset from data: 1 pixels
Reference frame, 51.63 sec.
Registered 100/6000 in 129.49s
Registered 200/6000 in 142.75s
Registered 300/6000 in 154.07s
Registered 400/6000 in 164.84s
Registered 500/6000 in 176.56s
Registered 600/6000 in 192.61s
Registered 700/6000 in 204.85s
Registered 800/6000 in 216.34s
Registered 900/6000 in 227.46s
Registered 1000/6000 in 238.15s
Registered 1100/6000 in 249.89s
Registered 1200/6000 in 265.60s
Registered 1300/6000 in 281.30s
Registered 1400/6000 in 293.95s
Registered 1500/6000 in 307.23s
Registered 1600/6000 in 320.17s
Registered 1700/6000 in 335.43s
Registered 1800/6000 in 348.48s
Registered 1900/6000 in 360.19s
Registered 2000/6000 in 373.55s
Registered 2100/6000 in 386.90s
Registered 2200/6000 in 398.93s
Registered 2300/6000 in 410.61s
Registered 2400/6000 in 423.74s
Registered 2500/6000 in 436.70s
Registered 2600/6000 in 453.56s
Registered 2700/6000 in 469.17s
Registered 2800/6000 in 482.38s
Registered 2900/6000 in 498.94s
Registered 3000/6000 in 515.60s
Registered 3100/6000 in 532.99s
Registered 3200/6000 in 550.25s
Registered 3300/6000 in 567.94s
Registered 3400/6000 in 581.53s
Registered 3500/6000 in 598.45s
Registered 3600/6000 in 612.32s
Registered 3700/6000 in 628.89s
Registered 3800/6000 in 642.74s
Registered 3900/6000 in 655.15s
Registered 4000/6000 in 667.05s
Registered 4100/6000 in 679.09s
Registered 4200/6000 in 695.71s
Registered 4300/6000 in 709.10s
Registered 4400/6000 in 726.36s
Registered 4500/6000 in 743.21s
Registered 4600/6000 in 759.49s
Registered 4700/6000 in 777.93s
Registered 4800/6000 in 791.09s
Registered 4900/6000 in 803.06s
Registered 5000/6000 in 819.91s
Registered 5100/6000 in 833.29s
Registered 5200/6000 in 845.37s
Registered 5300/6000 in 859.64s
Registered 5400/6000 in 873.36s
Registered 5500/6000 in 886.12s
Registered 5600/6000 in 903.25s
Registered 5700/6000 in 920.98s
Registered 5800/6000 in 939.11s
Registered 5900/6000 in 952.93s
Registered 6000/6000 in 965.91s
----------- Total 1056.33 sec
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-5-a9a861a237b2> in <cell line: 1>()
     51         print(db)
     52 
---> 53         output_ops = suite2p.run_s2p(ops=ops, db=db)
     54 
     55         print(set(output_ops.keys()).difference(ops.keys()))

6 frames
/usr/lib/python3.10/genericpath.py in getsize(filename)
     48 def getsize(filename):
     49     """Return the size of a file, reported by os.stat()."""
---> 50     return os.stat(filename).st_size
     51 
     52 

FileNotFoundError: [Errno 2] No such file or directory: '/content/drive/MyDrive/Personal_notebooks/Julia_Notebook/Imaging/hunger_state/DATA/240126f2/jn240126f2-live/suite2p/plane0/data.bin'

Version information:

suite2p v0.14.3

Context for the issue:

I am unable to run my 2-photon imaging data through the registration pipeline through Google Colab due to this issue - only allowing me to run the data through the GUI locally on a computer instead. This does not allow for high-throughput processing of the data and specific and accurate tailoring of the suite2p parameters on a per 2p file basis. Please help!! Thanks so much

@julia-napoli, have you tried running this code locally on a jupyter notebook on your machine? Have you also verified that the parent directory '/content/drive/MyDrive/Personal_notebooks/Julia_Notebook/Imaging/hunger_state/DATA/240126f2/jn240126f2-live/suite2p/plane0/ of the data.bin is found in your Google drive? Google Colab might be finicky with pathnames.

What also happens when you remove the line ops['save_path0'] = path? For this case, you don't need to explicitly set this since suite2p will just output things to the data_path you provided.

Hi @chriski777 - thanks for following up!
In answer to your questions,
(1) yes, the parent directory of the data.bin is in my Google drive and I have given appropriate authorization to allow this cell to write into the corresponding Google drive folder
(2) in removing the ops['save_path0'] = path line, I am unfortunately still getting the same error popping up

Hi @julia-napoli, sorry for the late response. This may be a more Google Colab specific issue than a suite2p issue. Have you tried running a similar notebook locally on your computer?

Hey @chriski777, no worries. I have not tried running it in a local jupyter notebook, however, this is quite unideal for us since we save large amounts of data, data organization spreadsheets, and analysis-related code all saved in our Google Drive, and want to be able to batch analyze our data while accessing and saving through Colab. Is it possible to help troubleshoot what might be going wrong with the Colab-related error we are seeing? If it's helpful for troubleshooting purposes on your end, I can try running it on a local notebook, though for us this is equally as much of a go-around as is running it through the GUI.

Hmm, I see. Could you try running with keep_movie_raw set to False? Also, have you been able to get the colab notebook working when using the default settings for suite2p?

Hey @chriski777, thanks for giving me some more things to try! I just changed the keep_movie_raw to False, and got the same data.bin related error again, unfortunately. But, in re-trying to run the code using all the default suite2p settings through colab I received a new error, saying that there is "no space left on device" to save some of the outputs to our Google Drive. The weird thing, however, is that we have plenty of space on our Google Drive where this is trying to be saved, and have had no other issues saving files from other notebooks into the Drive -- which, leads me to think that this might be a suite2p specific error that is causing this.
Screenshot 2024-03-14 at 10 46 24 AM

Thanks for the update! This error now leads me to believe that this is a Google Colab-specific issue.

This StackOverflow post indicates that the disk space in google drive is not the same as the disk space in Google Colab. I'm not sure if there is a way to adjust the disk space available when using Google Colab even if the directory you're saving to is in your Google Drive. Generally, we'd not recommend having your suite2p workflow on Google Colab as you run into a lot of issues of this nature (e.g., Google Colab might have limited space because it might be caching files even if you don't explicitly tell it to). Instead, I'd recommend setting up a jupyter notebook on a local machine and trying to run your analysis scripts on that machine.

Another thing you could do is try running suite2p with default settings on a much smaller version of the dataset you're using. My guess is that you're running into this data.bin not being saved and space issue due to the space constraints imposed in Google Colab.