X-DataInitiative/tick

HawkesKernelExp cannot get convolution on an older time

Closed this issue · 5 comments

Hello!

First of all, thanks for this amazing library!

I am new to Hawkes process and my issue is the following. I am trying to estimate trade arrival process using HawkesKernelExp. Once the model is fitted, I try to estimate intensity and receive an error: RuntimeError: HawkesKernelExp cannot get convolution on an older time unless it has been rewound.

Specifically, here is a dataset I use:
buys.xlsx

My sequence of actions (I'll try to be as precise as possible):

  1. Load data, convert datetime to epoch. Then since some events occur at the same time, I add a millisecond to distinguish them.
# Load
buys = pd.read_excel('buys.xlsx')
buys = buys.set_index('Timestamp')
# Timestamp to epoch
buy_epochs = [buys.index[x].replace(tzinfo=datetime.timezone.utc).timestamp() for x in range(len(buys))]
# Add a millisecond to the events that occure simultaneously
buyTimes = []
for i in range(len(buy_epochs)):
    if buy_epochs[i] == buy_epochs[i-1]:
        buyTimes.append(buyTimes[i-1]+0.001)
    else:
        buyTimes.append(buy_epochs[i])
buyTimes = np.array(buyTimes) 
  1. Next step is to subtract the minimum time, so the origin of my time is zero (according to #179)
# Required format of timestamps
buyTimes = buyTimes - buyTimes[0]

events = [buyTimes]

decay_candidates = np.arange(1,25,0.5)

best_score = -1e100
best_hawkes_learner = None
for i, decay in enumerate(decay_candidates):
    hawkes_learner = HawkesExpKern(decay, penalty='l1', C=10, gofit='likelihood', verbose=True, tol=1e-11, solver='svrg', step=1e-5, max_iter=10000)
    hawkes_learner.fit(events)
    
    hawkes_score = hawkes_learner.score()
    if hawkes_score > best_score:
        print(f'Obtained {hawkes_score} with {decay}')
        best_hawkes_learner = hawkes_learner
        best_score = hawkes_score
        best_decay = decay

print()
print('Best beta is:',best_decay)
print('Best mu is:',best_hawkes_learner.baseline[0])
print('Best alpha is:',best_hawkes_learner.adjacency[0][0])
  1. Finally, I estimate the intensity and receive the error mentioned above.
best_hawkes_learner.estimated_intensity(events,1)

To sum up, my questions are:

  1. Why do I receive this error? What am I doing wrong?
  2. Do I understand correctly that if I want to track intensity every second, I should set "intensity_track_step" qual to one?
  3. Does it make sense to use solvers other than SVRG?

Hello !
Thanks for using tick library !

Shortly:

  1. Are you sure that your events are sorted by time ? Maybe this assumption is broken when you do Add a millisecond to the events that occure simultaneously.
  2. Yes if your timestamp unit is 1 second
  3. Yes, any solver might work, SVRG is just an efficient one.

Hello !
Thanks for using tick library !

Shortly:

  1. Are you sure that your events are sorted by time ? Maybe this assumption is broken when you do Add a millisecond to the events that occure simultaneously.
  2. Yes if your timestamp unit is 1 second
  3. Yes, any solver might work, SVRG is just an efficient one.

Thanks for the prompt response! Indeed, I did not sort the events. Once sorted, the solver works well. Thanks!

Another quick question. I was trying to use other solvers (‘agd’, ‘bfgs’), and got this error RuntimeError: The sum of the influence on someone cannot be negative. Maybe did you forget to add a positive constraint to your proximal operator.

So my question is how can I incorporate a positive constraint into the model?

The positivity constraint is included by default. In your case this is probably due to the fact that you got all coeffs equal to zero at one point during the learning. My suggestion would be that if it was working with SVRG you should keep using SVRG :).
Otherwise you can try penalizing a bit less (higher C), use l2 penalization instead of l1, deactivate linesearch, use smaller steps or use a better initial value for coeffs0.

Thank you!

Even after sorting the data , I was getting error RuntimeError: HawkesKernelExp cannot get convolution on an older time unless it has been rewound.
I guess the part # Add a millisecond to the events that occur simultaneously , is causing problem , so dropped duplicate records by adding the following line and it worked.

Load

buys = pd.read_excel('buys.xlsx')
buys = buys.drop_duplicates(subset=['Timestamp'])