arundo/adtk

Mutivariate anomalie detection.

Opened this issue · 13 comments

Is there any possibility if we have 5 data points in a data frame(d1,d2,d3,d4,d5) and if any data point is an anomaly then which data point caused the anomaly and if we can assign some score like d5 was responsible 50% d3 is 30% etc.

Thanks!

@abhimanyu3 If we assume a series is the reason for anomaly, then I would apply a univariate detector to each series independently.

A multivariate detector is for the case where the anomaly is due to the relationship between series changes. In that case, it's hard to say which series "causes" the anomaly because the anomaly is caused by those series jointly.

@tailaiw Thanks a lot for your response. I have to find sudden peaks and drops in my multivariate time series data so I am using the PersistAD method on the df. Shall I use it on each column or even if I am using it on df it's the same thing?

Also, where I can find details like what is C in the PersistAD so that I can take a holistic decision in tuning.

Do you recommend any other method for finding sudden peaks and drop or persistAD is good.

hi there, do either of you know where you can find the formulae used in the PersistAD? I haven't been able to find it in the code.
Thanks a million guys

@abhimanyu3 and @ivanokeeffe PersistAD is implemented as a pipeline of DoubleRollingAggregate transformer and InterQuartileRangeAD detector. You may refer to the pipe_ attribute of a PersistAD object for more details.

The parameter c is the same one used by the internal InterQuartileRangeAD which controls the "normal range". InterQuartileRangeAD is a very classic simple outlier detection method. The value "c" is usually 1.5 or 3, although the user may specify according to the problem to solve.

@ivanokeeffe Hey! what kind of outlier you are trying to detect. Is it sudden peak and drops??

Does anyone know how to get the intermediate output for the PersistAD also?

Running a pipe object (adtk.pipeline or adtk.pipenet) with option return_intermediate=True will return the results of all steps of the pipe, instead of only the last one.

As mentioned above, like many other models in ADTK, PersistAD is internally implemented as a pipe of transformers and detectors. Attribute pipe_ points to the internal pipe object. So if we want the intermediate results, the easiest way is probably calling it as follows:

my_model = PersistAD()
my_model.pipe_.fit_detect(s, return_intermediate=True) # instead of my_model.fit_detect(s) which is equivalent to my_model.pipe_.fit_detect(s, return_intermediate=False)

@ivanokeeffe Hey! Are you also applying seasonality check in this. I mean by editing the pipeline?

Give it a read :- https://www.quora.com/How-do-you-identify-seasonality-in-a-time-series-data

Let me know if you will be able to do it. @ivanokeeffe

Did you get the maths behind the persistAD?