SteffenMoritz/imputeTS

Support imputing around a circle (e.g. wind direction)

AndrewCunliffe opened this issue · 5 comments

Feature request

Thanks for making imputeTS, it is really useful and easy to use.

Please consider adding a function to inpute around a circle. I'm thinking specifically of wind direction data in degrees (ranging 0 to 360), with a simple linear imputation that automatically selects the shortest distance around two points on the circle. It seems pretty simple, but I haven't been able to track down a simple solution in R. There is a relevant discussion here (https://stackoverflow.com/questions/9505862/shortest-distance-between-two-degree-marks-on-a-circle ).

@AndrewCunliffe - I don't develop imputeTS, but I do work with wind data. One trick I've found is to convert wind direction into unit circle coordinates. A good description is here: https://blog.tomgibara.com/post/11016504425/clustering-angles This would create a multivariate dataset of two coordinates. In that light, perhaps consider a multivariate imputation package such as mice: https://www.rdocumentation.org/packages/mice/versions/3.13.0

Interesting problem.
The solution of @glitt13 with the transformation into unit circle coordinates sounds pretty good.
I guess you want to do e.g. linear interpolation - which you then could do on these coordinates.
You will have a time series for the x and one for the y values (on both you would run na_interpolation).
Afterwards you transform back to one time series with degrees.
(also handling for some special cases is needed)

I'll try to post a code example later if I find the time.

Although I wouldn't switch to mice here (since it would ignore the time aspects).
Meaning if you have 10 degree, NA,NA,NA,NA, 100 degree. You probably want a results like 10 ,28 ,46 ,64 ,82 ,100.
Which would be only possible with mice, if you additionally somehow model the time aspects into additional variables.

Thanks both for the constructive suggestions.

I guess an alternative option is using na_ma to filling with a moving average mean. A colleague pointed out that this might even be more robust than linear interpolation in settings with highly variable wind direction, if the window is correctly specified. It would be wonderful to have this supported for degrees in imputeTS ('na_win_dir_ma'?).

I agree the linear interpolation of circular coordinates seems sensible for short gaps in many cases. As you say, it would ideally to handle evenly split special cases (e.g. 90° to 180°). It would be wonderful to have this supported in imputeTS, but I might see what I can come up with here.

A more sophisticated approach might seek to account for temporal patterns associated with diel and seasonal differences in wind direction, trained against the available information. This would probably be more robust for longer gaps, although it would probably be more complex to implement.

Here is a simple code example:

library("useful")
library("imputeTS")

#Test data
data <- c(0,NA,30,100,NA,NA,200,300,NA,100,359,NA,NA,5,90,NA,270)

cartesian <- pol2cart(r = 1, theta = data, degrees = TRUE)
imp <- na_interpolation(cartesian, option ="linear")
result <- cart2pol(imp$x,imp$y, degrees = TRUE)$theta
result

Linear interpolation should produce reasonable results. But you could also exchange the na_interpolation in the code with another algorithm (but might be that not all algorithms make sense in this specific setting).

The transformation above is not so complicated, luckily there is a pol2cart and a cart2pol function in package 'useful'. You have to be a little bit careful, when doing it on your own instead of using this package, since the R functions sin(), cos() expect radians instead of degrees as input. In my very shallow tests the evenly split special cases (e.g. 90° to 270°) still produced output. Might be, because the x and y values never really were exactly the same. Guess the function from package 'useful' first does a degree to radiant conversion first and the Degree * pi/180 produces this minimal variation, which is needed to avoid 0 / 0 points after interpolation.

I don't know how often and how quickly the wind direction changes. But you are probably right, interpolation does not seem like a good option for very long gaps. There is the maxgap option in every imputation function of imputeTS. With na_interpolation(x, option = "linear", maxgap = 3) you would just interpolate for NA gaps that are no larger than 3. Longer gaps will be left NA. Maybe after performing interpolation for the short gaps you could use another imputation function of imputeTS for the long gaps. Maybe something like na_mean(x, option ="median"), or na_replace where you just impute the most common wind direction. Or maybe na_seasplit(x, algorithm= "mean") to have the mean per season. Guess a lot is possible there ... probably depends a lot on the data. Also your colleague might be right, if it is a lot of sudden back and forth changes instead of gradual changes of wind direction, a moving average might give better results than pure interpolation.
Think you might have to do some testing, what works best.
( maybe by simulating missing data for complete parts of your dataset - as described in this issue #52 )

I also now thought a while about adding the above outlined solution to the package ... but I came to the conclusion, the use case is probably a little bit too specific. Every additional feature also always has the downside, that the package gets more complicated for the average user (mental overload by too many parameter/function options to choose). Think instead I'll try to add an additional documentation / vignette to the package in one of the next versions. I think a vignette about handling special cases / problems with provided code examples could be quite nice and helpful.