raphaelvallat/yasa

Spindles detection failing for some users

raphaelvallat opened this issue · 25 comments

As reported in #102 and #104, the yasa.spindles_detect function seems to be suddenly failing for some users. I am not able to reproduce the error.

Minimal (reproducible) example

Data here: data_N2_spindles_15sec_200Hz.txt

import numpy as np
import pandas as pd
import mne
import numba
import scipy
import yasa
import antropy
from platform import python_version

print("Python", python_version())
print(f"YASA {yasa.__version__}")
print(f"MNE {mne.__version__}")
print(f"NumPy {np.__version__}")
print(f"Pandas {pd.__version__}")
print(f"Numba {numba.__version__}")
print(f"SciPy {scipy.__version__}")
print(f"Antropy {antropy.__version__}")

sf = 200
data = np.loadtxt('data_N2_spindles_15sec_200Hz.txt')
sp = yasa.spindles_detect(data, sf)
sp.summary()

My output ✅

image

@remrama @chensnbetter @meniua @apavlo89 can you please run this minimal example and copy-paste a screenshot of the output here?

If you're getting the error, can you please try to run this notebook and let us know for any useful warnings/error?

Let's solve this! Thanks all

I ran this example and the errors are as follows:
图片

And I use YASA in a Python(3.10.6) environment.
test3

🤔 I really don't get it. I have tried updating my Pandas, Numpy, SciPy, Numba, re-installing YASA and it's still working fine on my computer. The only difference I see is that you are running Python 3.9+ and I'm running Python 3.8. I really don't see how this could affect YASA to be honest. Has anyone else been able to run successfully the example with Python 3.9+?

Code is still working fine after updating to Python 3.10.6 🤯 And I have the exact same Python package versions as your 3.10.6 environment @chensnbetter.

image

Are you both using Windows btw? Is this bug specific to Windows?

Works for me on Windows 🤷

Very mysterious...

I wonder if someone who is getting the error could mess around with some of the detection parameters to see if that helps identify where the detection is failing. Sorry I can't try it myself, I'm just not able to reproduce the initial problem.

Thanks. I'm using windows.

Ok I think I found the problem. I was able to reproduce it on my dusty Desktop. It has to do with the moving correlation thresholding, and I know this sounds insane but I think it depends on a tiny difference in Windows version.

Before I go into detail can some people help verify something for me? Please show your output from this slightly modified version of the original sample code.

import sys
import platform
import numpy as np
import pandas as pd
import mne
import numba
import scipy
import yasa
import antropy

print("System", platform.platform())
print("Python", sys.version)
print(f"YASA {yasa.__version__}")
print(f"MNE {mne.__version__}")
print(f"NumPy {np.__version__}")
print(f"Pandas {pd.__version__}")
print(f"Numba {numba.__version__}")
print(f"SciPy {scipy.__version__}")
print(f"Antropy {antropy.__version__}")

sf = 200
data = np.loadtxt('data_N2_spindles_15sec_200Hz.txt')

# I expect this to work for everyone.
print("=" * 10, "Run with no moving correlation threshold.", "=" * 10)
sp = yasa.spindles_detect(data, sf, thresh={"corr": None}, verbose=True)

# I expect this to still break (no spindles found) for a subset of users.
print("=" * 10, "Run with moving correlation threshold set to default 0.65.", "=" * 10)
sp = yasa.spindles_detect(data, sf, verbose=True)

Screenshot 2022-12-01 134219

Alright! Thank you so much @chensnbetter! Sorry about all the requests, but I appreciate you running and sharing all these outputs, it's really helpful. We still need to work out the details of this bug, but for now you should be able to get through the tutorials by setting thresh={"corr": None} inside yasa.spindles_detect should detect spindles for you and get you running through the tutorial. Just change this one line:

# Apply the detection using yasa.spindles_detect
# sp = yasa.spindles_detect(data, sf)  # Old
sp = yasa.spindles_detect(data, sf, thresh={"corr": None})  # New

Thanks again for your output message, I'm now sure it was not an issue of Windows versioning, at least from what I can tell (so I guess I was crazy). Spindles are detected on my machine that has the same Windows version as you 10.0.19044, so it's not that. So what is it?

Moving correlation thresholding is the problem

cc @raphaelvallat

The problem comes from the corr thresholding, probably from the underlying yasa.numba._corr function? Or maybe some of the mne filtering. Wherever it is, I suspect this has to do with some seriously small differences in float point operations or rounding, but I'm not sure (could be something weird af like this obscure thread).

Turning on YASA's verbosity gives us insight. On a working system, keeping the default corr threshold to 0.65 you get the following message generated from line 885:

01-Dec-22 12:39:53 | INFO | N supra-theshold moving corr = 345

But on a system that detects no spindles, you will get this:

01-Dec-22 12:39:53 | INFO | N supra-theshold moving corr = 0

This message is generated by finding all values of the mcorr array that are greater than or equal to the corr threshold (should be 345, but it's 0 on systems that find no spindles). mcorr is returned by yasa.moving_transform, which is likely where the problem is. Note it's specific to setting method="corr", the other threshold methods don't give the error, so it might occur within the numba _corr() call. Maybe some float-point discrepancies across systems occurring within there? I have no numba experience so not totally sure how to approach this further but happy to take hints.

Numba is the problem. If you replace the use of numba-based _corr() with scipy.stats.pearsonr to calculate the correlation, spindles are found as expected.

Based on printing stuff to debug, the breakdown occurs at rden. It's value is nan, and so the final returned value, a division of rden, is also nan. And so the returned numba list of r values is all nan.

So why is rden = nan on some systems but not others? I HAVE NO IDEA! The biggest mystery -- and I wouldn't believe me if I told me this -- is that the problem goes away if I add a print line right before or after the rden calculation, and only there. I'm NOT KIDDING. You can find some similar experiences on stackoverflow (eg) where printing inside a numba function causes unexpected behavior, and most thoughts are that it has to do with the nopython compiling from numba. Maybe using print with nopython=True causes weirdness. But noone seems to have any solution, probably because like in our situation the problem doesn't reproduce across systems.

jfc

The only immediate solution I can think of is hideous. You could do some kind of check with the return from _corr and if you don't like it you could use pearsonr from scipy instead. Not sure what to do exactly.

Thanks so much @remrama for your in-depth investigation! So I think the error was introduced in #86. Interestingly, someone opened a very similar PR on antropy recently, and their solution was not to do an if..else statement, but instead to add a very small value to the denominator to avoid division by zero error.

Can you please try to run the code with:

import sys
eps = sys.float_info.epsilon

@jit("float64(float64[:], float64[:])", nopython=True)
def _corr(x, y):
    """Fast Pearson correlation."""
    n = x.size
    mx, my = x.mean(), y.mean()
    xm2s, ym2s, r_num = 0, 0, 0
    for i in range(n):
        xm = x[i] - mx
        ym = y[i] - my
        r_num += xm * ym
        xm2s += xm**2
        ym2s += ym**2
    r_d1 = np.sqrt(xm2s)
    r_d2 = np.sqrt(ym2s)
    r_den = r_d1 * r_d2
    return r_num / (r_den + eps)

If this is not working you can just do:

return r_num / (r_den + 10e-9)

@raphaelvallat this doesn't solve the problem :/ Still no spindles detected.

I think this issue is different from that in Antropy, because when working properly (on most systems it seems), the r_den values are nowhere near zero! If I understand it correctly, which I'm not sure I do.

The current behavior is still truly bizarre to me. See here, showing no spindles found when making the zero-adjustment.

no_spindles_found

But with SAME technical code but now printing r_den, everything works completely as expected... Literally adding a print brings r_den to life 🧟 🤔

spindles_found_when_printing

It is truly bizarre.. Do you get the same bug when setting nopython=False?

I think we'll need to somehow re-implement this function then. The issue is that the spindles detection is already quite slow, and I fear that it will be even worse if we switch to np.corrcoef or scipy.stats.pearson... I'm out sick this week but I can run some benchmarks next week.

Setting nopython=False does not help, but I think it makes sense because the compilation isn't failing and so that isn't changing behavior in our case. I think that only changes the fallback behavior if the initial compilation fails, which would've raised earlier errors if it happened. What we want to do is force the use of the fallback object mode. To do that, we can add forceobj=True, and that makes it work.

But the problem is, there are 100 other things that also fix the problem, and they all fix the problem independently. I've been racking my brain to try and figure out what common underlying change is going on in Numba when any of these fixes are used:

Things that fix _corr()

  • Set forceobj=True - I didn't find this too useful once I realized there were other fixes. It's basically saying it works when treated as normal Python code, but in these other patches it doesn't have to be.
  • Set fastmath=True - This felt like it was going to be informative because it changes the underlying computations, so this would've made total sense. But alas, you can revert back to default fastmath=False and all the next things will will fix it.
  • Set signature to "float64(float64[::1], float64[::1])" or just remove it entirely - We're using 1d arrays that should comply with both C- and F-contiguous expectations (check with the ndarray.flags), so I think the current signature specification of A-contiguous is supposed to work. And it does on most systems. But not here, though you can fix the OG problem by telling numba to expect C-contiguous, or don't tell it anything at all and it will infer it as C.
  • Set error_model="numpy" - This also seemed promising. The error_model parameter changes how numba handles zero-division. But the default is python, which should raise an error when any zero-division attempt is made, but it isn't doing that even when I make a simplified example. Despite not having errors in test cases, when switching to error_model="numpy", OG problem is fixed.
  • Set boundscheck=True - Works with just this change.
  • Use np.arange instead of range - Yes. This change -- all alone by itself -- fixes the problem.
  • Pass xm2s, ym2s, r_num as keyword arguments - Do this instead of setting them within the function and it works. Between this and the last one, I thought maybe this was a sneaky object-mode thing, but numba tutorials are constantly setting variables and using range in their functions.

Again, all of these are patches by themselves. I have a jupyter notebook with printouts of each separate fix. I didn't link it here bc it's a bit extra but I could if it would help. While it seems great (easy fix!), it's a bit unsettling to me because I have no idea what the root cause is. I'm hesitant to implement any of these because to me it all seems like completely unexpected behavior and has the potential for more downstream disruptions. I feel like these are just a ton of red herrings. I also feel like I'm losing my mind.

Wow, you went full-on into the rabbit hole 🤯

From all these solutions, I think I like the np.arange best because it will not affect performance and is straightforward — although I don't understand it. boundscheck=True is also good but it might cost some precious nanoseconds.

To be clear, this is not related to one specific Numba version?

Yes the np.arange change seems most reasonable now that you mention it.

I haven't checked with other numba versions. I'll explore that, and if it's all good I'll submit a PR with that simple change.

Not a version issue, it's some kind of conda environment issue for me. Though looking at previous screenshots, some users were using system Python so I suppose this only adds to confusion.

I'm running this test.py file:

import numpy
import scipy
import numba
import llvmlite
print(f"numpy-{numpy.__version__}")
print(f"scipy-{scipy.__version__}")
print(f"numba-{numba.__version__}")
print(f"llvmlite-{llvmlite.__version__}")

import numpy as np
from numba import jit

@jit("float64(float64[:], float64[:])")
def _corr(x, y):
    """Fast Pearson correlation."""
    n = x.size
    mx, my = x.mean(), y.mean()
    xm2s, ym2s, r_num = 0, 0, 0
    for i in range(n):
        xm = x[i] - mx
        ym = y[i] - my
        r_num += xm * ym
        xm2s += xm**2
        ym2s += ym**2
    r_d1 = np.sqrt(xm2s)
    r_d2 = np.sqrt(ym2s)
    r_den = r_d1 * r_d2
    return r_num / r_den

xx = np.arange(1, 100, dtype=np.float64)
yy = np.arange(1, 100, dtype=np.float64)
print("Correlation =", _corr(xx, yy))

Screenshot 2022-12-09 221816

@raphaelvallat -- want me to just change to np.arange and call it a day?

@remrama really confusing indeed. Btw I realized that we can even simplify further by using zip:

for xi, yi in zip(x, y):
    xm = xi - mx
    ym = yi - my
    r_num += xm * ym
    xm2s += xm**2
    ym2s += ym**2

It seems to be even a bit faster than the previous implementation. Can you check whether you have the bug when using zip?

Works. Great idea. Submitting PR now 🚀

Bugfix implemented in #115

Will need to release a new version of YASA soon

I have just released a new version of YASA that should fix this issue. Please update YASA with pip install -U yasa.

Thanks,
Raphael

Wow, this was really some awesome detective work @remrama