Mutivariate anomalie detection.

Question

Mutivariate anomalie detection.

Opened this issue 4 years ago · 13 comments

Is there any possibility if we have 5 data points in a data frame(d1,d2,d3,d4,d5) and if any data point is an anomaly then which data point caused the anomaly and if we can assign some score like d5 was responsible 50% d3 is 30% etc.

Thanks!

Answer 1 · 2020-03-26T18:53:54.000Z

@abhimanyu3 If we assume a series is the reason for anomaly, then I would apply a univariate detector to each series independently.

A multivariate detector is for the case where the anomaly is due to the relationship between series changes. In that case, it's hard to say which series "causes" the anomaly because the anomaly is caused by those series jointly.

Answer 2 · 2020-03-26T21:17:36.000Z

@tailaiw Thanks a lot for your response. I have to find sudden peaks and drops in my multivariate time series data so I am using the PersistAD method on the df. Shall I use it on each column or even if I am using it on df it's the same thing?

Also, where I can find details like what is C in the PersistAD so that I can take a holistic decision in tuning.

Do you recommend any other method for finding sudden peaks and drop or persistAD is good.

Answer 3 · 2020-03-27T13:57:18.000Z

hi there, do either of you know where you can find the formulae used in the PersistAD? I haven't been able to find it in the code.
Thanks a million guys

Answer 4 · 2020-03-30T20:59:31.000Z

@abhimanyu3 and @ivanokeeffe PersistAD is implemented as a pipeline of DoubleRollingAggregate transformer and InterQuartileRangeAD detector. You may refer to the pipe_ attribute of a PersistAD object for more details.

The parameter c is the same one used by the internal InterQuartileRangeAD which controls the "normal range". InterQuartileRangeAD is a very classic simple outlier detection method. The value "c" is usually 1.5 or 3, although the user may specify according to the problem to solve.

Answer 5 · 2020-03-30T23:00:48.000Z

Thanks a million for replying to my question. Appreciate it greatly.

…

On Mon, 30 Mar 2020, 21:59 tailaiw, ***@***.***> wrote: @abhimanyu3 <https://github.com/abhimanyu3> and @ivanokeeffe <https://github.com/ivanokeeffe> PersistAD is implemented as a pipeline of DoubleRollingAggregate transformer and InterQuartileRangeAD detector. You may refer to the pipe_ attribute of a PersistAD object for more details. The parameter c is the same one used by the internal InterQuartileRangeAD which controls the "normal range". InterQuartileRangeAD is a very classic simple outlier detection method. The value "c" is usually 1.5 or 3, although the user may specify according to the problem to solve. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#99 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ANSICLCVW3GSOG5YTNM4KI3RKEB4FANCNFSM4LSKS5RA> .

Answer 6 · 2020-03-31T02:21:26.000Z

@ivanokeeffe Hey! what kind of outlier you are trying to detect. Is it sudden peak and drops??

Answer 7 · 2020-03-31T09:15:16.000Z

Yep exactly, just trying to detect sudden drops actually. The PersistAD works perfectly but just trying to dig in to the maths behind it...

…

On Tue, 31 Mar 2020 at 03:21, Abhimanyu ***@***.***> wrote: @ivanokeeffe <https://github.com/ivanokeeffe> Hey! what kind of outlier you are trying to detect. Is it sudden peak and drops?? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#99 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ANSICLEA3P3CXBBRL5A66KDRKFHTLANCNFSM4LSKS5RA> .

Answer 8 · 2020-03-31T15:56:12.000Z

Does anyone know how to get the intermediate output for the PersistAD also?

Answer 9 · 2020-04-01T14:49:15.000Z

Running a pipe object (adtk.pipeline or adtk.pipenet) with option return_intermediate=True will return the results of all steps of the pipe, instead of only the last one.

As mentioned above, like many other models in ADTK, PersistAD is internally implemented as a pipe of transformers and detectors. Attribute pipe_ points to the internal pipe object. So if we want the intermediate results, the easiest way is probably calling it as follows:

my_model = PersistAD()
my_model.pipe_.fit_detect(s, return_intermediate=True) # instead of my_model.fit_detect(s) which is equivalent to my_model.pipe_.fit_detect(s, return_intermediate=False)

Answer 10 · 2020-04-03T13:36:16.000Z

@ivanokeeffe Hey! Are you also applying seasonality check in this. I mean by editing the pipeline?

Answer 11 · 2020-04-06T09:22:16.000Z

Thanks a million for replying to my questions. This has been super helpful!

…

On Wed, 1 Apr 2020 at 15:49, tailaiw ***@***.***> wrote: Running a pipe object (adtk.pipeline or adtk.pipenet) with option return_intermediate=True will return the results of all steps of the pipe, instead of only the last one. As mentioned above, like many other models in ADTK, PersistAD is internally implemented as a pipe of transformers and detectors. Attribute pipe_ is the internal pipe. So if we want the intermediate results, the easiest way is probably calling it as follows: my_model = PersistAD() my_model.pipe_.fit_detect(s, return_intermediate=True) # instead of my_model.fit_detect(s) which is equivalent to my_model.pipe_.fit_detect(s, return_intermediate=False) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#99 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ANSICLDUWXP2EXYMNT7W24TRKNH7XANCNFSM4LSKS5RA> .

Answer 12 · 2020-04-06T09:22:55.000Z

Hey, not at the moment but I guess that is something I could be doing too. Do you have any resources for explaining what seasonality is in time series? Thanks

…

On Fri, 3 Apr 2020 at 14:36, Abhimanyu ***@***.***> wrote: @ivanokeeffe <https://github.com/ivanokeeffe> Hey! Are you also applying seasonality check in this. I mean by editing the pipeline? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#99 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ANSICLGA6VI6DHQSKY66RV3RKXQ6BANCNFSM4LSKS5RA> .

Answer 13 · 2020-04-07T15:56:33.000Z

Give it a read :- https://www.quora.com/How-do-you-identify-seasonality-in-a-time-series-data

Let me know if you will be able to do it. @ivanokeeffe

Did you get the maths behind the persistAD?