Confused about the R matrix interpretation
RanaElnaggar opened this issue · 4 comments
Hi,
I am confused about the returned R matrix interpretation in the online detection algorithm. In the notebook example, the third plot is R[Nw,Nw:-1], where it is mentioned to be "the probability at each time step for a sequence length of 0, i.e. the probability of the current time step to be a changepoint."
So why do we choose the indices R[Nw,Nw:-1] ? why not R[Nw,:]
Also, it was mentioned as an example that R[7,3] means the probability at time step 7 taking a sequence of length 3, so does R[Nw,Nw:-1] means that we are taking all the probabilities at time step Nw ?
Any suggestions to help me to understand the output R ?
Thanks
Same question. Confunsed about R[Nw,Nw:-1], and it dose not work in my code...
To be honest: I'm confused as well. I think I mixed the order of dimensions in the description. So R[7, 3]
means the probability at timestep 3 that the sequence is 7 timesteps long (which is not a very sensible question to ask, but just to keep the example the same. Now R[Nw, Nw:-1]
means give me the probability for each time step that the sequence is Nw
steps long already.
Hi,
Thanks for the clarification. Can you also elaborate more on your choice of Nw=10 ? Also, I am confused about how the following statement is related to the original algorithm: "Because it's very hard to correctly evaluate a change after a single sample of a new distribution, we instead can "wait" for Nw samples and evaluate the probability of a change happening Nw samples prior."
-
Also, according to your description, in order to get the changepoints for a datastream we need to check for data points with R > 0, is this true ?
-
I was also trying to map the algorithm in the original paper to the implementation, and I was wondering what does mu, alpha, beta, and kappa in the code correspond to in the paper ?
-
I was also wondering why did you choose the value of lambda in the hazard function to be equal to 250 ?
-
Also, I am confused about the difference between the paper theory and the implementation. So, the theory evaluates the predictive marginal distribution for x_t+1, while the code evaluates the probability that the point x_t was a change point, so can you clarify more how these two equations are related?
Thanks
I think a better way to check for changepoints is by adding a condition something like the below -
thresh = 100
if (abs(maxes[i] - maxes[i+1]) > thresh):
print("possible changepoint!")
I think this would ensure that the changepoint jump is always greater than a desired threshold whenever it occurs. For more intuitive understanding of BOCD, I have written a blog here that could possibly help to understand.