patrick-kidger/torchcde

Online prediction tasks needs examples

Closed this issue · 4 comments

Hi Patrick. First of all thank you all for creating such a beautiful library for the neural CDE model!

I have read both of your papers, neural CDE as well as its extension for online predictions. As far as I could understand, the rectilinear interpolation allows for "causal" interpolation without requiring future values. However, since constant interpolation is used for every channel in X, the change in the control signal over time is always going to be 0, except for the knot points where the gradient is possibly undefined due to a discrete jump. This means that the change in the hidden state over time is also going to be 0, since . Correct me if I'm wrong, please. Also, by looking at the repositories published, it is still not clear to me how I could use the online prediction extension of the neural CDE algorithm.

To my understanding, the entire channel dimension could be thought of as a concatenation of measurements from multiple different sensors and each channel (except for time and observational density) as a measurement of a single sensor. So, what I would like to say is, it would be much more clear and much appreciated if you could provide a minimal code example of how to do online predictions as data arrives for a given channel. Because I could not put the pieces together myself to be able to do it.

Cheers,
Deniz

Hi Deniz -- thanks for your interest!

Regarding your first point, about the derivative dX/dt: there actually isn't a discrete jump. (Which I know is a bit counter-intuitive.)

Suppose we observe the very short time series ((t_1, x_1), (t_2, x_2)).
Then the rectilinear interpolation of this is a piecwise linear X with knots s_1<s_2<s_3, such that

X(s_1) = (t_1, x_1)
X(s_2) = (t_2, x_1)
X(s_3) = (t_2, x_2)

Side note: in Neural CDEs for Online Prediction Tasks we take specifically s_1=0, s_2=1, s_3=2, but that's just for simplicity. Theoretically these values are arbitrary because of reparameterisation invariance.

In particular you can see that there isn't any discrete jump. Time is a channel of X -- not its parameterisation.
(It doesn't help that we used time for both things in the original Neural CDE paper -- which doesn't really help communicate this subtle point.)

Regarding your second point, about an example:

  • Conceptually, the only thing that's needed to use a neural CDE for an online prediction task is to use the appropriate interpolation scheme. So for the paper we basically just used torchcde.linear_interpolation_coeffs(..., rectilinear=0) instead of torchcde.natural_cubic_coeffs and that's it!
  • The code for the paper is available here, which might prove to be a useful reference. (It's not really a self-contained example, though.)
  • It's possible that you're running into difficulties because you're trying to actually deploy this in practice -- i.e. not just simulating an online task in an academic setting? The code isn't really set up to make this easy, unfortunately. (To get data out, you'd need to call torchcde.cdeint every time you want an update.) This is something I'm thinking about fixing over the next couple of months. In the mean time if you want to do it yourself then it's actually very easy to write your own integrator. (e.g. this implementation of RK4 is only 9 lines long.)

I'm thankful for your prompt and detailed reply. It is a sign of genuine excitement and interest in your research topic 😊

As for my first question, let us instead consider 1 common time channel and 2 data channels for 2 sensors, A and B, each measured at different frequencies such that the measurement frequencies has some variance. So, this is theoretically the case of multiple asynchronous measurements with possibly missing data. We could in fact consider the both sensor measurements to be 2 different time-series with a common time domain (i.e., time sources are the same for both). More concretely, let the ts_a=[(0.1, 1), (0.3, 2), (0.6, 3), (0.8, 4)] be the measurement time-series of A containing the list of (timestamp, value) pairs over time. And similarly, ts_b=[(0.3, 10), (0.4, 20)].

So, given the 2 time-series above, I would merge them a bit different from what I have seen so far in your code examples, by not duplicating any timestamps: ts=[(0.1, 1, nan), (0.3, 2, 10), (0.4, nan, 20), (0.6, 3, nan), (0.8, 4, nan)] This is essentially outer joining along the timeindexes of 2 pandas.DataFrames using pandas.merge. If my understanding of the rectilinear interpolation is correct, and assuming that we took care of nan values in the first set of measurements as ts.iloc[0, 2] = ts.iloc[:, 2].mean(skipna=True), then the control path of the above time-series will have 9 knots s_1< ... <s_9 such that:

X(s_1) = (0.1, 1, 15)
X(s_2) = (0.3, 1, 15)
X(s_3) = (0.3, 2, 10)
X(s_4) = (0.4, 2, 10)
X(s_5) = (0.4, 2, 20)
X(s_6) = (0.6, 2, 20)
X(s_7) = (0.6, 3, 20)
X(s_8) = (0.8, 3, 20)
X(s_9) = (0.8, 4, 20)

At any point between 2 subsequent knots s_i, s_j, where s_j = s_i + 1, the control value will be linearly interpolated between X(s_i) and X(s_j). If I was not mistaken so far, I cannot imagine how I would be able to use this kind of (rectilinear) interpolation in a real-time setting where I get measurement values asynchronously for sensors A and B. This still remains a mystery for me honestly 😕

As for the practicality aspect, my expectation was exactly as you thought. Deploying it in a real setting and not just simulating. Also, thank you once again for providing a codebase to read further on. However, I found it quite hard to follow how the code works or how I would go about adapting it to my use case: Readings from multiple real/physical sensors running at different rates. It would be a great service for all fellow researchers interested in your work to see a minimal example of doing (simulated) online predictions (e.g., this example perfectly demonstrates neural CDE).

Your understanding of rectilinear interpolation is correct.

Now, notice that in your example above, going from s_2j-1 to s_2j (i.e. odd-to-even) is a transition in which the data is held fixed and time is updated. Meanwhile going from s_2j to s_2j+1 (i.e. even-to-odd) is a transition in which time is held fixed, and the state variables updated.

Thus, in a real-time scenario: continuously perform an odd-to-even integration whilst waiting for data to arrive. Once data has arrived, place down some s_2j, and switch to doing even-to-odd integration until some s_2j+1 (that you're free to pick, e.g. it could be s_2j+1 := 1 + s_2j.) Then switch back to doing odd-to-even integration whilst waiting for the next piece of data to arrive.

A real-time example would be nice, I agree. As I say, the codebase isn't really set up for this at the moment -- so far this codebase has primarily been used to support research projects.

Your answer was exactly to the point and I think it should serve as a conceptual starting point for confused folks like me, who wants to use neural CDE for online predictions. Feel free to close the issue or you may also use it for tracking progress in the future.

Cheers!