Error in optim(init[mask], getLike, method = "L-BFGS-B", lower = rep(0, : L-BFGS-B needs finite values of 'fn'

Question

Error in optim(init[mask], getLike, method = "L-BFGS-B", lower = rep(0, : L-BFGS-B needs finite values of 'fn'

englianhu opened this issue 6 years ago · 23 comments

> data_tm1
# A tibble: 1,151,978 x 15
   index               BidOpen BidHigh BidLow BidClose AskOpen AskHigh AskLow AskClose  year  week
   <dttm>                <dbl>   <dbl>  <dbl>    <dbl>   <dbl>   <dbl>  <dbl>    <dbl> <dbl> <dbl>
 1 2014-12-29 00:01:00    120.    120.   120.     120.    120.    120.   120.     120.  2015    53
 2 2014-12-29 00:02:00    120.    120.   120.     120.    120.    120.   120.     120.  2015    53
 3 2014-12-29 00:03:00    120.    120.   120.     120.    120.    120.   120.     120.  2015    53
 4 2014-12-29 00:04:00    120.    120.   120.     120.    120.    120.   120.     120.  2015    53
 5 2014-12-29 00:05:00    120.    120.   120.     120.    120.    120.   120.     120.  2015    53
 6 2014-12-29 00:06:00    120.    120.   120.     120.    120.    120.   120.     120.  2015    53
 7 2014-12-29 00:07:00    120.    120.   120.     120.    120.    120.   120.     120.  2015    53
 8 2014-12-29 00:08:00    120.    120.   120.     120.    120.    120.   120.     120.  2015    53
 9 2014-12-29 00:09:00    120.    120.   120.     120.    120.    120.   120.     120.  2015    53
10 2014-12-29 00:10:00    120.    120.   120.     120.    120.    120.   120.     120.  2015    53
# ... with 1,151,968 more rows, and 4 more variables: bias.open <dbl>, bias.high <dbl>,
#   bias.low <dbl>, bias.close <dbl>
> data_tm1_NA <- data_tm1 %>% 
+   dplyr::select(BidOpen, BidHigh, BidLow, BidClose, 
+                 AskOpen, AskHigh, AskLow,  AskClose) %>% 
+   prodNA(noNA = 0.01) %>% 
+   cbind(data_tm1[1], .) %>% tbl_df
> 
> data_tm1_1_tidyr <- data_tm1_NA %>% 
+   fill(BidOpen, BidHigh, BidLow, BidClose, 
+        AskOpen, AskHigh, AskLow, AskClose) %>% #default direction down
+   fill(BidOpen, BidHigh, BidLow, BidClose, 
+        AskOpen, AskHigh, AskLow, AskClose, .direction = 'up')
> 
> data_tm1_1_tidyr %>% anyNA
[1] FALSE
> 
> data_tm1_1_tidyr %<>% mutate(
+   bias.open = if_else(AskOpen>AskHigh|AskOpen<AskLow, 1, 0), 
+   bias.high = if_else(AskHigh<AskOpen|AskHigh<AskLow|AskHigh<AskClose, 1, 0), 
+   bias.low = if_else(AskLow>AskOpen|AskLow>AskHigh|AskLow>AskClose, 1, 0), 
+   bias.close = if_else(AskClose>AskHigh|AskClose<AskLow, 1, 0))
> 
> data_tm1_1_tidyr %>% 
+   dplyr::filter(bias.open==1|bias.high==1|bias.low==1|bias.close==1)
> 
> data_tm1_1_tidyr %<>% 
+   summarise(
+     AskOpen = mean((AskOpen - data_m1$AskOpen)^2), 
+     AskHigh = mean((AskHigh - data_m1$AskHigh)^2), 
+     AskLow = mean((AskLow - data_m1$AskLow)^2), 
+     AskClose = mean((AskClose - data_m1$AskClose)^2), 
+     Mean.HLC = (AskHigh + AskLow + AskClose)/3, 
+     Mean.OHLC = (AskOpen + AskHigh + AskLow + AskClose)/4, 
+     bias.open = sum(bias.open)/length(bias.open), 
+     bias.high = sum(bias.high)/length(bias.high), 
+     bias.low = sum(bias.low)/length(bias.low), 
+     bias.close = sum(bias.close)/length(bias.close)) %>% tbl_df
> 
> data_tm1_1_tidyr %>% 
+   kable(caption = 'MSE') %>% 
+   kable_styling(bootstrap_options = c('striped', 'hover', 'condensed', 'responsive')) %>%
+   scroll_box(width = '100%')#, height = '400px')
> data_m1_NA <- data_m1 %>% prodNA(noNA = 0.1)
> data_m1_10_impTS <- llply(algo, function(x) {
+   data_m1_NA %>% 
+     dplyr::select(starts_with('Ask'), starts_with('Bid')) %>% 
+     map(na.seadec, algorithm = x) %>% as.tibble
+   })
Error in optim(init[mask], getLike, method = "L-BFGS-B", lower = rep(0,  : 
  L-BFGS-B needs finite values of 'fn'

I noticed that sometimes there will be error prompt me when I am using na.seadec(x, algorithm = x).

Answer 1 · 2018-10-24T16:12:04.000Z

Hello englianhu, thanks a lot for opening an issue! 👍
I'll take a deeper look at it - don't know if I can fix the problem completely, since the main problem seems to lie with the StructTS function which is internally used by the function. But maybe I can alter the input given to this function, that it does not run into this error. Or at least I can manage to give a more meaningful error message with instruction on what to do differently to the end user.

Answer 2 · 2018-10-24T16:54:55.000Z

@englianhu do you have small reproducible dataset, which you know for sure runs into this error?
Maybe I already have a good fix for the problem - just need some data to test if it really fixes the problem

Answer 3 · 2018-10-25T07:31:55.000Z

data_m1_NA <- data_m1 %>% 
  dplyr::select(BidOpen, BidHigh, BidLow, BidClose, 
                AskOpen, AskHigh, AskLow,  AskClose) %>% 
  prodNA(noNA = 0.01) %>% 
  cbind(data_m1[1], .) %>% tbl_df

data_m1_1_impTS <- llply(algo, function(x) {
  data_m1_NA %>% 
    dplyr::select(starts_with('Ask'), starts_with('Bid')) %>% 
    map(na.seadec, algorithm = x) %>% as.tibble
  })
names(data_m1_1_impTS) <- algo
data_m1_1_impTS %<>% ldply %>% tbl_df

data_m1_1_impTS %<>% mutate(
  bias.open = if_else(AskOpen>AskHigh|AskOpen<AskLow, 1, 0), 
  bias.high = if_else(AskHigh<AskOpen|AskHigh<AskLow|AskHigh<AskClose, 1, 0), 
  bias.low = if_else(AskLow>AskOpen|AskLow>AskHigh|AskLow>AskClose, 1, 0), 
  bias.close = if_else(AskClose>AskHigh|AskClose<AskLow, 1, 0))

data_m1_1_impTS %>% 
  dplyr::filter(bias.open==1|bias.high==1|bias.low==1|bias.close==1)

data_m1_1_impTS %<>% 
  ddply(.(.id), summarise, 
        AskOpen = mean((AskOpen - data_m1$AskOpen)^2), 
        AskHigh = mean((AskHigh - data_m1$AskHigh)^2), 
        AskLow = mean((AskLow - data_m1$AskLow)^2), 
        AskClose = mean((AskClose - data_m1$AskClose)^2), 
        Mean.HLC = (AskHigh + AskLow + AskClose)/3, 
        Mean.OHLC = (AskOpen + AskHigh + AskLow + AskClose)/4, 
        bias.open = sum(bias.open)/length(bias.open), 
        bias.high = sum(bias.high)/length(bias.high), 
        bias.low = sum(bias.low)/length(bias.low), 
        bias.close = sum(bias.close)/length(bias.close)) %>% tbl_df

data_m1_1_impTS %>% 
  kable(caption = 'MSE') %>% 
  kable_styling(bootstrap_options = c('striped', 'hover', 'condensed', 'responsive')) %>%
  scroll_box(width = '100%')#, height = '400px')
- Error in optim(init[mask], getLike, method = "L-BFGS-B", lower = rep(0,  : 
-  L-BFGS-B needs finite values of 'fn'
- Calls: <Anonymous> ... apply.base.algorithm -> na.kalman -> StructTS -> optim
- In addition: There were 39 warnings (use warnings() to see them)

data1 : data_m1.zip
data2 : data_tm1.zip

Answer 4 · 2018-10-25T14:09:17.000Z

I believed that is becasue of initial row of dataset contains value but below shows that is not the cause.

> data_tm1_1_impTS <- llply(algo, function(x) {
+   data_tm1_NA %>% 
+     dplyr::select(starts_with('Ask'), starts_with('Bid')) %>% 
+     map(na.seadec, algorithm = x) %>% as.tibble
+   })
- Error in optim(init[mask], getLike, method = "L-BFGS-B", lower = rep(0,  : 
-   L-BFGS-B needs finite values of 'fn'
> data_tm1_NA
# A tibble: 28,737 x 9
   index               BidOpen BidHigh BidLow BidClose AskOpen AskHigh AskLow AskClose
   <dttm>                <dbl>   <dbl>  <dbl>    <dbl>   <dbl>   <dbl>  <dbl>    <dbl>
 1 2015-01-12 00:01:00    118.    118.   118.     118.    118.    118.   118.     118.
 2 2015-01-12 00:02:00    118.    118.   118.     118.    118.    118.   118.     118.
 3 2015-01-12 00:03:00    118.    118.   118.     118.    118.    118.   118.     118.
 4 2015-01-12 00:04:00    118.    118.   118.     118.    118.    118.   118.     118.
 5 2015-01-12 00:05:00     NA     118.   118.     118.    118.    118.   118.     118.
 6 2015-01-12 00:06:00    118.    118.   118.     118.    118.    118.   118.     118.
 7 2015-01-12 00:07:00    118.    118.   118.     118.    118.    118.   118.     118.
 8 2015-01-12 00:08:00    118.    118.   118.     118.    118.    118.   118.     118.
 9 2015-01-12 00:09:00    118.    118.   118.     118.    118.    118.   118.     118.
10 2015-01-12 00:10:00    118.    118.   118.     118.    118.    118.    NA      118.
# ... with 28,727 more rows
> data_tm1_NA %>% md.pattern
      index BidClose BidHigh AskOpen AskLow AskHigh BidLow BidOpen AskClose     
26520     1        1       1       1      1       1      1       1        1    0
292       1        1       1       1      1       1      1       1        0    1
287       1        1       1       1      1       1      1       0        1    1
4         1        1       1       1      1       1      1       0        0    2
276       1        1       1       1      1       1      0       1        1    1
5         1        1       1       1      1       1      0       1        0    2
1         1        1       1       1      1       1      0       0        1    2
272       1        1       1       1      1       0      1       1        1    1
2         1        1       1       1      1       0      1       1        0    2
4         1        1       1       1      1       0      1       0        1    2
1         1        1       1       1      1       0      0       1        1    2
267       1        1       1       1      0       1      1       1        1    1
1         1        1       1       1      0       1      1       1        0    2
3         1        1       1       1      0       1      0       1        1    2
3         1        1       1       1      0       0      1       1        1    2
253       1        1       1       0      1       1      1       1        1    1
5         1        1       1       0      1       1      1       1        0    2
3         1        1       1       0      1       1      1       0        1    2
3         1        1       1       0      1       1      0       1        1    2
2         1        1       1       0      1       0      1       1        1    2
6         1        1       1       0      0       1      1       1        1    2
248       1        1       0       1      1       1      1       1        1    1
3         1        1       0       1      1       1      1       1        0    2
2         1        1       0       1      1       1      1       0        1    2
3         1        1       0       1      1       1      0       1        1    2
1         1        1       0       1      1       0      1       1        1    2
3         1        1       0       1      0       1      1       1        1    2
5         1        1       0       0      1       1      1       1        1    2
241       1        0       1       1      1       1      1       1        1    1
1         1        0       1       1      1       1      1       1        0    2
2         1        0       1       1      1       1      1       0        1    2
5         1        0       1       1      1       1      0       1        1    2
6         1        0       1       1      1       0      1       1        1    2
2         1        0       1       1      0       1      1       1        1    2
4         1        0       1       0      1       1      1       1        1    2
1         1        0       0       1      1       1      1       1        1    2
          0      262     266     281    285     291    297     303      313 2298

Answer 5 · 2018-10-27T21:04:25.000Z

I was now able to replicate the problem. Thanks for the data and code :)
But have no fix yet - hopefully I will have time for this in the next days.

Answer 6 · 2018-10-28T00:49:45.000Z

Ok...I took a deeper look into this now:
The problem occurs for na.seadec(x, algorithm ="kalman") and na.kalman().

The root cause lies in a internal call of stats::StructTS() - which itself calls stats::optim, where the actual error occurs. optim has a parameter 'fn' which needs to have a finite value.
Somehow with this specific dataset leads to an Inf value in the call from StructTS.

I added a dataset here, which is just the time series needed to provoke the error.
With na.kalman(errorData) the error can be provoked.

I really do not get, why the error comes up exactly exactly for this specific dataset.
(since it comes from underlying packages I depend upon it is also hard to fix)

But a quick workaround is adding a additional parameter which is given to StructTS - type ="level" . :
na.kalman(errorData, type="level")
na.seadec(errorData, algorithm ="kalman", type="level")

With this type="level" parameter the error does not occur any more.

errorTS.RDA.zip

Answer 7 · 2018-10-28T00:52:19.000Z

To sum up, if somebody has the same issue:

A quick workaround is adding a additional parameter which is given to StructTS - type ="level" . :
na.kalman(errorData, type="level")
na.seadec(errorData, algorithm ="kalman", type="level")

Please also drop me a mail - or open an issue (that I see how often people run into this).

Answer 8 · 2019-01-09T08:21:47.000Z

I have this problem when trying to impute missing values in large time series with the na.kalman function. It seems like somewhere it gets a very high value and considers it as infinite. The proposed solution by SteffenMoritz #26 (comment) could be a quick solution for this problem. However sometimes the problem persists. When that happens, you can try to scale the time series to avoid getting so high values. See the following example with a large time series (86400 values).

sum(is.na(ts))
[1] 154

ts_kalman <- na.kalman(ts)
Error in optim(init[mask], getLike, method = "L-BFGS-B", lower = rep(0, :
L-BFGS-B needs finite values of 'fn'

ts_kalman <- na.kalman(ts, type = "level")
Error in optim(init[mask], getLike, method = "L-BFGS-B", lower = rep(0, :
L-BFGS-B needs finite values of 'fn'

ts_scaled <- scales:::rescale(ts, c(0, 1))
ts_kalman <- na.kalman(ts_scaled)
Warning message:
In StructTS(data, ...) :
possible convergence problem: 'optim' gave code = 52 and message ‘ERROR: ABNORMAL_TERMINATION_IN_LNSRCH’

ts_kalman <- na.kalman(ts_scaled, type = "level")

sum(is.na(ts_kalman))
[1] 0

ts_kalman <- scales:::rescale(ts_kalman, c(min(ts, na.rm = T), max(ts, na.rm = T)))

all.equal(ts[!is.na(ts)], ts_kalman[!is.na(ts)])
[1] TRUE

Answer 9 · 2019-01-11T04:50:20.000Z

👍 Many thanks for your solution @kevinv21

Answer 10 · 2019-10-19T03:52:57.000Z

hi guys, i'm having the same issue. Even with kevin's solution, I'm not able to get what i need. After i applied rescale function. R complains about can't rescale a time series object. @kevinv21 , can you show the steps before sum(is.na(ts))? is your ts a time series data in this sum() function. I can't seem to work around it...

Answer 11 · 2019-10-20T02:06:45.000Z

Thanks for informing about the problem @tbs17. What kind of input object do you have?
(imputeTS accepts all kinds of inputs vector, ts, data.frame, zoo, tsibble)

I think the workaround of @kevinv21 only works with vector input.
(the scales:::rescale needs a vector)

Just transform your input to a vector and then try to run the workaround code again.

new_input <- as.vector(your_ts)

This will not affect the imputation, the timestamps are not important (since your time series is hopefully equi-distant).
You can afterwards transform to a ts object again. For a ts object this would work like this:

coredata(your_ts) <- imputed_vector

Answer 12 · 2019-10-20T10:09:40.000Z

As @SteffenMoritz says, ts is a large time series of 86400 points represented as a vector of real values. I scaled it with the rescale function from the scales package, however, you can rescale your data with other functions/packages such as the scale function from the timeSeries packages, (or rescale it manually by using any approach for data standardization/normalization https://stackoverflow.com/questions/20256028/understanding-scale-in-r , https://stats.stackexchange.com/questions/70801/how-to-normalize-data-to-0-1-range), or as @SteffenMoritz proposes you can transform your data into a vector.

Answer 13 · 2019-10-20T14:21:40.000Z

I tried it with the vector.it didn't seem to work as well. I don't have a large time series. But what I have is a time series data that have 5 NAs and 25 same values. I don't know if the non changing value also threw the error...

…

On Sun, Oct 20, 2019, 6:09 AM Kevin Villalobos ***@***.***> wrote: As @SteffenMoritz <https://github.com/SteffenMoritz> says, ts is a large time series of 86400 points represented as a *vector* of real values. I scaled it with the *rescale* function from the *scales* package, however, you can rescale your data with other functions/packages such as the *scale* function from the *timeSeries* packages, (or rescale it manually by using any approach for data standardization/normalization https://stackoverflow.com/questions/20256028/understanding-scale-in-r , https://stats.stackexchange.com/questions/70801/how-to-normalize-data-to-0-1-range), or as @SteffenMoritz <https://github.com/SteffenMoritz> proposes you can transform your data into a vector. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#26?email_source=notifications&email_token=AIKT4C5NP73EUXVHM7RVM33QPQU6JA5CNFSM4F67JEX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBYGW4I#issuecomment-544238449>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AIKT4C45T7PM2RVAHFX6OM3QPQU6JANCNFSM4F67JEXQ> .

Answer 14 · 2019-10-20T20:31:14.000Z

Yeah, it looks like the non-changing value also produces the error, I have tried with the following time series and I get the error with the first one (ts1) but not with the second one (ts2):

ts1 <- c(5,5,5,5,5,5,5,5,NA,NA,5,5,5,5,5,5,5,5,5,5,NA,NA,5,5,5,NA)
na_kalman(ts1, type = "level")
Error in optim(init[mask], getLike, method = "L-BFGS-B", lower = rep(0,  : 
  L-BFGS-B needs finite values of 'fn'

ts2 <- c(5,5,5,5,5,5,5,5,NA,NA,5,5,5,4,5,5,5,5,5,5,NA,NA,5,5,5,NA)
na_kalman(ts2, type = "level")
5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 4.952381 4.952381 5.000000 5.000000 5.000000 5.000000 4.000000 5.000000 5.000000 5.000000 5.000000 5.000000 4.952381 4.952381 5.000000 5.000000 5.000000 4.952381

Anyway if your time series contain repeated values, you can also try other kind of techniques such as linear interpolation or last observation carried forward, etc.

Answer 15 · 2019-10-21T20:35:20.000Z

thank you for the due diligence! I have tried linear option. However, my missing value needs to be in an increasing trend which can't be decreased value along the time. Do you know how to finetune some of your imputation algorithm to get an always increasing trend?

…

On Sun, Oct 20, 2019 at 4:31 PM Kevin Villalobos ***@***.***> wrote: Yeah, it looks like the non-changing value also produces the error, I have tried with the following time series and I get the error with the first one (ts1) but not with the second one (ts2): ts1 <- c(5,5,5,5,5,5,5,5,NA,NA,5,5,5,5,5,5,5,5,5,5,NA,NA,5,5,5,NA) na_kalman(ts1, type = "level") Error in optim(init[mask], getLike, method = "L-BFGS-B", lower = rep(0, : L-BFGS-B needs finite values of 'fn' ts2 <- c(5,5,5,5,5,5,5,5,NA,NA,5,5,5,4,5,5,5,5,5,5,NA,NA,5,5,5,NA) na_kalman(ts2, type = "level") 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 4.952381 4.952381 5.000000 5.000000 5.000000 5.000000 4.000000 5.000000 5.000000 5.000000 5.000000 5.000000 4.952381 4.952381 5.000000 5.000000 5.000000 4.952381 Anyway if your time series contain repeated values, you can also try other kind of techniques such as linear interpolation or last observation carried forward, etc. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#26?email_source=notifications&email_token=AIKT4C4GDIVMINL74RDZS5TQPS5ZFA5CNFSM4F67JEX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBYTHBY#issuecomment-544289671>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AIKT4C27ZGVGSGCEUYEEOALQPS5ZFANCNFSM4F67JEXQ> .

Answer 16 · 2019-10-21T21:23:49.000Z

@tbs, can you provide an example time series you want to impute?

"But what I have is a time series data that have 5 NAs
and 25 same values."

This sounds like e.g.
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, NA, 4, 4, 4, NA, 4, 4, 4, NA, NA, 4, 4

Why would you expect a algorithm to impute increasing values there?
I mean there has to be at least an increasing trend somewhere visible that it would make sense to impute increasing values there.

If all values are 4 - expect the NAs - probably imputing 4 makes most sense.
(except you have prior knowledge that indicates something else)

In general, if you strictly want to follow your imputations a trend - the na.kalman method would be a good choice. (but of course only works, if the ARIMA model thinks there is a trend in the data).
Is the '25 same values' series just one of many time series you want to impute? Then use another algorithm for this (e.g. na.interpolation) and use na.kalman for all the other series.

You could also try to use, na.ma() - (moving average) with a high k parameter e.g. k = 7 or something like that. If there really is a strong trend in the data, small noise shouldn't lead to decreasing values with this setup.

If you really have only a series with '25 same values' and expect the NAs to be increasing - despite no indication shown in the data - you have to model this on your own. Since all algorithms only can extract information out of the data / they won't magically impute a trend that is not shown in the data. (yet there are some transformations you can make to force imputed values in a certain range)

Answer 17 · 2019-10-22T03:44:33.000Z

Hi steffen, Thanks for the response! I'm imputing students lesson taken which could only increase or being static and certain time but can't drop as the time goes. The linear and spine I tried are only impute same value as the be first non-na. There's Kalman method does impute more variations. However, I see values decrease over the time sometimes. That's why I'm asking if there's way to just anchor a value and Set it as if it's increasing upto the first non-na.

…

On Mon, Oct 21, 2019, 5:23 PM SteffenMoritz ***@***.***> wrote: @tbs <https://github.com/tbs>, can you provide an example time series you want to impute? "But what I have is a time series data that have 5 NAs and 25 same values." This sounds like e.g. 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, NA, 4, 4, 4, NA, 4, 4, 4, NA, NA, 4, 4 Why would you expect a algorithm to impute increasing values there? I mean there has to be at least an increasing trend somewhere visible that it would make sense to impute increasing values there. If all values are 4 - expect the NAs - probably imputing 4 makes most sense. (except you have prior knowledge that indicates something else) In general, if you strictly want to follow your imputations a trend - the na.kalman method would be a good choice. (but of course only works, if the ARIMA model thinks there is a trend in the data). Is the '25 same values' series just one of many time series you want to impute? Then use another algorithm for this (e.g. na.interpolation) and use na.kalman for all the other series. You could also try to use, na.ma() - (moving average) with a high k parameter e.g. k = 7 or something like that. If there really is a strong trend in the data, small noise shouldn't lead to decreasing values with this setup. If you really have only a series with '25 same values' and expect the NAs to be increasing - despite no indication shown in the data - you have to model this on your own. Since all algorithms only can extract information out of the data / they won't magically impute a trend that is not shown in the data. (yet there are some transformations you can make to force imputed values in a certain range) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#26?email_source=notifications&email_token=AIKT4C6PCPZPXB4HLIZOF3LQPYMWNA5CNFSM4F67JEX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEB322PI#issuecomment-544714045>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AIKT4C4CQZY6TVOWVHNICN3QPYMWNANCNFSM4F67JEXQ> .

Answer 18 · 2019-10-22T14:35:47.000Z

hi steffen, I see in your package you also allow user to create their own model. Can I just make an easy linear model based on the time series data? or is there way that i can set the first value to be 1 or 2 and let it gradually increase?

…

On Mon, Oct 21, 2019 at 11:44 PM Tracy Shen ***@***.***> wrote: Hi steffen, Thanks for the response! I'm imputing students lesson taken which could only increase or being static and certain time but can't drop as the time goes. The linear and spine I tried are only impute same value as the be first non-na. There's Kalman method does impute more variations. However, I see values decrease over the time sometimes. That's why I'm asking if there's way to just anchor a value and Set it as if it's increasing upto the first non-na. On Mon, Oct 21, 2019, 5:23 PM SteffenMoritz ***@***.***> wrote: > @tbs <https://github.com/tbs>, can you provide an example time series > you want to impute? > > "But what I have is a time series data that have 5 NAs > and 25 same values." > > This sounds like e.g. > 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, NA, 4, 4, 4, NA, 4, 4, 4, NA, NA, 4, 4 > > Why would you expect a algorithm to impute increasing values there? > I mean there has to be at least an increasing trend somewhere visible > that it would make sense to impute increasing values there. > > If all values are 4 - expect the NAs - probably imputing 4 makes most > sense. > (except you have prior knowledge that indicates something else) > > In general, if you strictly want to follow your imputations a trend - the > na.kalman method would be a good choice. (but of course only works, if the > ARIMA model thinks there is a trend in the data). > Is the '25 same values' series just one of many time series you want to > impute? Then use another algorithm for this (e.g. na.interpolation) and use > na.kalman for all the other series. > > You could also try to use, na.ma() - (moving average) with a high k > parameter e.g. k = 7 or something like that. If there really is a strong > trend in the data, small noise shouldn't lead to decreasing values with > this setup. > > If you really have only a series with '25 same values' and expect the NAs > to be increasing - despite no indication shown in the data - you have to > model this on your own. Since all algorithms only can extract information > out of the data / they won't magically impute a trend that is not shown in > the data. (yet there are some transformations you can make to force imputed > values in a certain range) > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#26?email_source=notifications&email_token=AIKT4C6PCPZPXB4HLIZOF3LQPYMWNA5CNFSM4F67JEX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEB322PI#issuecomment-544714045>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AIKT4C4CQZY6TVOWVHNICN3QPYMWNANCNFSM4F67JEXQ> > . >

Answer 19 · 2021-01-18T20:32:42.000Z

Oh, probably didn't see this one.
About the own model and this question "Can I
just make an easy linear model based on the time series data"

You'd just define the model like this.

usermodel <- arima(tsAirgap, order = c(1, 0, 1))$model
na_kalman(tsAirgap, model = usermodel)

So first you specify your user specific ARIMA model and then you give it as parameter to na_kalman.
ARIMA stand for AR (autoregressive), I (integrated), MA (Moving Average).

So if you just want a simple linear model you might want so specify ARIMA(1,0,0).
Then you model would look like this
usermodel <- arima(tsAirgap, order = c(1, 0, 1))$model

Answer 20 · 2021-01-18T20:40:44.000Z

Some more information about the "Error in optim ... " issue. Just had a new mail from a user that had this issue.

Turns out the problem here was also caused by a series that had only NAs and one non-changing value.

This is what Sigve Sørensen (thx for reporting!) wrote me:
"The error comes when there are series that contain only nans AND zero values (nan, nan, 0, 0, 0, nan … nan)"

So seems similar to what @kevinv21 wrote above. But, the type = "level" workaround as described by Kevin did not seem to work here.

So in general, if anybody also has this error, look out for time series with only one repeated measure. Sorry, that I don't have a fix yet, since the error comes from an underlying package. But you probably anyway can impute these series with only one repeated measure quite easily (since you would just replace all NAs by this one repeated measure). Because, if all values of a time series series are e.g. 2 ... you'd probably also expect the NAs to be 2.

Answer 21 · 2021-04-27T21:44:14.000Z

Just got another mail with not exactly the same but a related issue.
(still have to check further details there)

The problem there is also:

possible convergence problem: 'optim' gave code = 52 and message �ERROR: ABNORMAL_TERMINATION_IN_LNSRCH�

Answer 22 · 2021-05-03T14:16:26.000Z

Dear @SteffenMoritz SteffenMoritz, Thanks a lot for your contribution. I am using imputeTS to fill the missing value from the panel data. And I find some time series (such as 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, NA, 4, 4, 4, NA, 4, 4, 4, NA, NA, 4, 4) will lead to Error in optim(init[mask], getLike, method = "L-BFGS-B", lower = rep(0, : L-BFGS-B needs finite values of 'fn'. So I have to divide my panel data into two subsets and apply the different algorithms to subsets. Thanks a lot for your above answers.

Answer 23 · 2021-06-09T13:25:37.000Z

Oh, sorry for answering so late.
Thx at @hezhichao1991 for reporting.

It is great having you all contribute to make the package better :)

Found the time to do some further checks:

Error in optim(init[mask], getLike, method = "L-BFGS-B", lower = rep(0, : L-BFGS-B needs finite values of 'fn'.

Always appears, when the series has no variation in values e.g. like you say 4,4,4,4,NA,4,NA
As soon that there is only one different value in the series everything works as expected.

Even c( 4, 4,4,4, NA,4.000001,4,4) works. It appears only if all values are exactly the same..

The reason lies in functions I am callen - can't change these.
But the solution might be obvious, think I just insert a check if the series is all constant values.

Looking at 4,4,4,4,NA,4,NA - it is quite sure, that the correct imputed value should also be a 4.

Fix will come with the next update!