Zero-shot performance is not reproduceable

Question

Zero-shot performance is not reproduceable

Opened this issue a month ago · 5 comments

Dear Authors,

Thanks for your work! Following your zero-shot setting (with the lookback length as 672 and the forecast length as 96), my results do not match those reported in Table 18 in your paper, though running your official codes.

Below are the MSEs:

	Timer (reproduced)	Timer-1B	Timer-16B	Timer-28B
ETTh1	0.454	0.438	0.364	0.393
traffic	0.479	0.458	0.399	0.414
weather	0.190	0.181	0.203	0.243
electricity	0.210	0.192	0.139	0.147

Could you provide more information about your released Timer_forecast_1.0.ckpt?

Answer 1 · 2024-12-10T15:17:56.000Z

Same question: Could you please clarify the dataset size that the provided checkpoint was pretrained on?

Answer 2 · 2024-12-18T02:27:53.000Z

@MogicianXD

Hi, we have released the model at HuggingFace, where you can evaluate the model following the provided pipeline.

Answer 3 · 2024-12-18T02:29:40.000Z

@iapcal

Please refer to the appendix of our paper, where the details of datasets and configurations are provided.

Answer 4 · 2024-12-18T06:26:58.000Z

Is there any different hyperparameter for the new checkpoint? Besides, could you update the metrics?

Answer 5 · 2024-12-18T06:49:10.000Z

The main differences are (1) the context length of training is extended (2) the pre-training scale is enlarged with the lotsa dataset.

Zero-shot results are provided here. You can evaluate the model on the test set following the notebook.