ankane/prophet-ruby

Wrong forecast results

Closed this issue · 5 comments

Hello,

Prophet return some strange values

3.1.0 :200 > series
 =>
{#<Date: 2022-01-03 ((2459583j,0s,0n),+0s,2299161j)>=>1.639,
 #<Date: 2022-01-05 ((2459585j,0s,0n),+0s,2299161j)>=>1.649,
 #<Date: 2022-01-06 ((2459586j,0s,0n),+0s,2299161j)>=>1.659,
 #<Date: 2022-01-07 ((2459587j,0s,0n),+0s,2299161j)>=>1.669,
 #<Date: 2022-01-08 ((2459588j,0s,0n),+0s,2299161j)>=>1.659,
 #<Date: 2022-01-10 ((2459590j,0s,0n),+0s,2299161j)>=>1.669,
 #<Date: 2022-01-11 ((2459591j,0s,0n),+0s,2299161j)>=>1.689,
 #<Date: 2022-01-12 ((2459592j,0s,0n),+0s,2299161j)>=>1.679,
 #<Date: 2022-01-13 ((2459593j,0s,0n),+0s,2299161j)>=>1.689,
 #<Date: 2022-01-14 ((2459594j,0s,0n),+0s,2299161j)>=>1.699,
 #<Date: 2022-01-15 ((2459595j,0s,0n),+0s,2299161j)>=>1.699,
 #<Date: 2022-01-18 ((2459598j,0s,0n),+0s,2299161j)>=>1.709,
 #<Date: 2022-01-20 ((2459600j,0s,0n),+0s,2299161j)>=>1.719,
 #<Date: 2022-01-21 ((2459601j,0s,0n),+0s,2299161j)>=>1.729,
 #<Date: 2022-01-22 ((2459602j,0s,0n),+0s,2299161j)>=>1.719,
 #<Date: 2022-01-25 ((2459605j,0s,0n),+0s,2299161j)>=>1.729,
 #<Date: 2022-01-27 ((2459607j,0s,0n),+0s,2299161j)>=>1.739,
 #<Date: 2022-01-29 ((2459609j,0s,0n),+0s,2299161j)>=>1.729}
3.1.0 :201 > Prophet.forecast(series)
 =>
{#<Date: 2022-01-30 ((2459610j,0s,0n),+0s,2299161j)>=>5.547226954196092,
 #<Date: 2022-01-31 ((2459611j,0s,0n),+0s,2299161j)>=>1.7157371604726062,
 #<Date: 2022-02-01 ((2459612j,0s,0n),+0s,2299161j)>=>1.7390083969256263,
 #<Date: 2022-02-02 ((2459613j,0s,0n),+0s,2299161j)>=>1.7340095632954138,
 #<Date: 2022-02-03 ((2459614j,0s,0n),+0s,2299161j)>=>1.7490078239414186,
 #<Date: 2022-02-04 ((2459615j,0s,0n),+0s,2299161j)>=>1.749007500166091,
 #<Date: 2022-02-05 ((2459616j,0s,0n),+0s,2299161j)>=>1.7390047264672899,
 #<Date: 2022-02-06 ((2459617j,0s,0n),+0s,2299161j)>=>5.557226905370591,
 #<Date: 2022-02-07 ((2459618j,0s,0n),+0s,2299161j)>=>1.725737111645919,
 #<Date: 2022-02-08 ((2459619j,0s,0n),+0s,2299161j)>=>1.7490083480986351}

Clearly the first one and the value for the 2022-02-06 are wrong. I suspect a arm64 bug but I don't have an amd64 for test the code.

Hey @blackrez, I'm seeing similar results on x86-64, so don't think it's related to ARM.

It looks like the problem is series doesn't include any Sundays, but it's trying to predict them. If you need predictions for Sundays, make sure to include them in the input. Otherwise, you can filter them from the output.

It looks like the Python library has similar behavior.

import pandas as pd
from prophet import Prophet

df = pd.DataFrame({
  'ds': ["2022-01-03", "2022-01-05", "2022-01-06", "2022-01-07", "2022-01-08", "2022-01-10", "2022-01-11", "2022-01-12", "2022-01-13", "2022-01-14", "2022-01-15", "2022-01-18", "2022-01-20", "2022-01-21", "2022-01-22", "2022-01-25", "2022-01-27", "2022-01-29"],
  'y': [1.639, 1.649, 1.659, 1.669, 1.659, 1.669, 1.689, 1.679, 1.689, 1.699, 1.699, 1.709, 1.719, 1.729, 1.719, 1.729, 1.739, 1.729]
})

m = Prophet()
m.fit(df)

future = m.make_future_dataframe(periods=10, include_history=False)
forecast = m.predict(future)
print(forecast[['ds', 'yhat']])

Output

          ds      yhat
0 2022-01-30 -3.960552
1 2022-01-31  1.720297
2 2022-02-01  1.739000
3 2022-02-02  1.731007
4 2022-02-03  1.749000
5 2022-02-04  1.749000
6 2022-02-05  1.739000
7 2022-02-06 -3.950552
8 2022-02-07  1.730297
9 2022-02-08  1.749000

Thanks for your response and your help, my dataset have a lots of issue and it have a lots of missing days.

Another option is to disable weekly seasonality with the advanced API:

require "prophet"

df = Rover::DataFrame.new({
  "ds" => ["2022-01-03", "2022-01-05", "2022-01-06", "2022-01-07", "2022-01-08", "2022-01-10", "2022-01-11", "2022-01-12", "2022-01-13", "2022-01-14", "2022-01-15", "2022-01-18", "2022-01-20", "2022-01-21", "2022-01-22", "2022-01-25", "2022-01-27", "2022-01-29"],
  "y" => [1.639, 1.649, 1.659, 1.669, 1.659, 1.669, 1.689, 1.679, 1.689, 1.699, 1.699, 1.709, 1.719, 1.729, 1.719, 1.729, 1.739, 1.729]
})

m = Prophet.new(weekly_seasonality: false)
m.fit(df)

future = m.make_future_dataframe(periods: 10, include_history: false)
forecast = m.predict(future)
p forecast[["ds", "yhat"]]

Output

                     ds                yhat
2022-01-30 00:00:00 UTC  1.7360273138060855
2022-01-31 00:00:00 UTC  1.7374329821020085
2022-02-01 00:00:00 UTC  1.7388386503979312
2022-02-02 00:00:00 UTC   1.740244318693854
2022-02-03 00:00:00 UTC  1.7416499869897768
2022-02-04 00:00:00 UTC  1.7430556552856997
2022-02-05 00:00:00 UTC  1.7444613235816226
2022-02-06 00:00:00 UTC  1.7458669918775456
2022-02-07 00:00:00 UTC   1.747272660173468
2022-02-08 00:00:00 UTC  1.7486783284693912

Yeah, it could be the best solution for my very inconsistant dataset. Many thanks for your help.