RussellLuo/timingwheel

定时器提前执行了

Opened this issue · 8 comments

imxyb commented

代码如下:

func (m *Manager) addTimer(t *task.Task) (timer *timingwheel.Timer, err error) {
	taskConfig, ok := m.taskConfigs[t.Name]
	if !ok {
		return nil, errors.Errorf("taskConfig %s not found", t.Name)
	}
	fn, err := taskConfig.GeneratorFunc(t)
	if err != nil {
		return nil, err
	}

	diff, err := util.CalcDiffWithNow(t.ExecDateTime)
	if err != nil {
		return nil, err
	}
	if diff < 0 {
		log.Warnf("timer diff is less the zero")
		return nil, nil
	}
	// 计算当前时间差(单位是秒)
	timer = m.tw.AfterFunc(time.Duration(diff)*time.Second, fn)
	log.Infof("add timer successfully, diff seconds:%d, task id:%s, object id:%s ", diff, t.ID)
	return timer, nil
}

我打印的日志:

add timer successfully, diff seconds:703726, task id:ObjectID("5e8    1fba9b8d1fca6422ec4c2")

也就是说会在703726秒后执行,大概是8天左右,但是这个任务第二天就执行了。。而且这是偶发的问题,是因为我这个时间间隔过大导致的吗

@imxyb 你好,谢谢关注!

  1. 请问 m.tw 是如何初始化的?tick 和 wheelSize 的值是多少?
  2. 另外 m.tw 里只有这一个 703726s 的任务吗?如果还有别的任务,数量和时间间隔分布大概是怎样的?
  3. 异常情况下,这个 703726s 的任务最终是间隔多久执行的?
imxyb commented

@RussellLuo

  1. 代码如下:
tw := timingwheel.NewTimingWheel(time.Millisecond, 3600)
tw.Start()
  1. 不是的,大概有200个任务左右,基本都是一些比较久远的延迟任务,比如8天后,几个月后,甚至几年后都有。
  2. 异常情况下,这个任务是3-30号凌晨添加的,但是3-31凌晨的时候就已经执行了

好的。这两天比较忙,我周末分析一下。

复现失败

我用如下代码尝试复现:

package main

import (
        "fmt"
        "time"

        "github.com/RussellLuo/timingwheel"
)

func main() {
        tw := timingwheel.NewTimingWheel(time.Millisecond, 3600)
        tw.Start()
        defer tw.Stop()

        exitC := make(chan time.Time, 1)
        tw.AfterFunc(703726 * time.Second, func() {
                exitC <- time.Now().UTC()
        })

        fmt.Printf("Task created at: %v\n", time.Now().UTC())
        fmt.Printf("Task executed at: %v\n", <-exitC)
}

目前为止(2020-04-04 03:10:00 +0000 UTC),任务还没有执行:

$ go run main.go                                      
Task created at: 2020-04-02 13:09:30.229318917 +0000 UTC

时间间隔为几个月或几年的任务,由于时间成本太高,短时间很难尝试。

原因猜想

  1. 因为 AfterFunc 会计算 expiration 绝对时间(其中 UnixNano 可以表示 1678 年 ~ 2262 年)。所以只要绝对时间不超过 2262 年,那应该跟时间间隔无关
  2. 按照 “时间单位 tick 为 1s,大小 wheelSize 为 3600” 的规格,第一层时间轮可以表示 1s x 3600 = 1h,第二层时间轮可以表示 1h x 3600 = 3600h,第三层时间轮可以表示 3600h x 3600 = 12960000h。假设最长的时间间隔按 10 年计算,对应 24h x 365 x 10 = 87600h,最多只会用到 3 层时间轮。所以时间轮的层数也并不多
  3. 每次 Poll 延迟队列 的时候,会 根据当前时间查询到期的任务。如果上述 expiration 的计算没有问题,则有可能是当前时间出了问题。所以建议加日志打印下每次 Poll 中的 时间 now,确认下时间是否有问题?

如果有更好的思路和排查手段,欢迎一起探讨 @imxyb

imxyb commented

@RussellLuo 日志有时间的,时间确实是3-11日,我这边可以在poll的时候加个日志观察下。感谢你的帮忙,我这边有进展也会继续在这同步。

FJSDS commented

遇到类似的问题,两个连接同时连上之后分别调用 ScheduleFunc

func (this_ *Session) Next(prevTime time.Time) time.Time {
	if this_.isClose() {
		return time.Time{}
	}
	return prevTime.Add(time.Second * 2)
}

然后出现调用延迟的状况

2020-09-23T14:59:52.772+0800	DEBUG	tcp/acceptor.go:93	session connected	{"local": "127.0.0.1:9999", "remote": "127.0.0.1:55229"}
2020-09-23T14:59:52.772+0800	DEBUG	tcp/acceptor.go:93	session connected	{"local": "127.0.0.1:9999", "remote": "127.0.0.1:55230"}
2020-09-23T14:59:54.771+0800	DEBUG	tcp/session.go:196	send ping msg success	{"remote": "127.0.0.1:55229", "session name": "", "session id": ""}
2020-09-23T14:59:56.773+0800	DEBUG	tcp/session.go:196	send ping msg success	{"remote": "127.0.0.1:55229", "session name": "", "session id": ""}
2020-09-23T14:59:58.772+0800	DEBUG	tcp/session.go:196	send ping msg success	{"remote": "127.0.0.1:55229", "session name": "", "session id": ""}
2020-09-23T15:00:00.772+0800	DEBUG	tcp/session.go:196	send ping msg success	{"remote": "127.0.0.1:55229", "session name": "", "session id": ""}
2020-09-23T15:00:02.401+0800	DEBUG	tcp/session.go:196	send ping msg success	{"remote": "127.0.0.1:55230", "session name": "", "session id": ""}
2020-09-23T15:00:02.401+0800	DEBUG	tcp/session.go:196	send ping msg success	{"remote": "127.0.0.1:55230", "session name": "", "session id": ""}
2020-09-23T15:00:02.401+0800	DEBUG	tcp/session.go:196	send ping msg success	{"remote": "127.0.0.1:55230", "session name": "", "session id": ""}
2020-09-23T15:00:02.401+0800	DEBUG	tcp/session.go:196	send ping msg success	{"remote": "127.0.0.1:55230", "session name": "", "session id": ""}
2020-09-23T15:00:02.772+0800	DEBUG	tcp/session.go:196	send ping msg success	{"remote": "127.0.0.1:55230", "session name": "", "session id": ""}
2020-09-23T15:00:04.772+0800	DEBUG	tcp/session.go:196	send ping msg success	{"remote": "127.0.0.1:55230", "session name": "", "session id": ""}
2020-09-23T15:00:06.772+0800	DEBUG	tcp/session.go:196	send ping msg success	{"remote": "127.0.0.1:55230", "session name": "", "session id": ""}
2020-09-23T15:00:08.771+0800	DEBUG	tcp/session.go:196	send ping msg success	{"remote": "127.0.0.1:55230", "session name": "", "session id": ""}
2020-09-23T15:00:10.400+0800	DEBUG	tcp/session.go:196	send ping msg success	{"remote": "127.0.0.1:55229", "session name": "", "session id": ""}
2020-09-23T15:00:10.400+0800	DEBUG	tcp/session.go:196	send ping msg success	{"remote": "127.0.0.1:55229", "session name": "", "session id": ""}
2020-09-23T15:00:10.400+0800	DEBUG	tcp/session.go:196	send ping msg success	{"remote": "127.0.0.1:55229", "session name": "", "session id": ""}
2020-09-23T15:00:10.400+0800	DEBUG	tcp/session.go:196	send ping msg success	{"remote": "127.0.0.1:55229", "session name": "", "session id": ""}
2020-09-23T15:00:10.771+0800	DEBUG	tcp/session.go:196	send ping msg success	{"remote": "127.0.0.1:55229", "session name": "", "session id": ""}
2020-09-23T15:00:12.772+0800	DEBUG	tcp/session.go:196	send ping msg success	{"remote": "127.0.0.1:55229", "session name": "", "session id": ""}
2020-09-23T15:00:14.772+0800	DEBUG	tcp/session.go:196	send ping msg success	{"remote": "127.0.0.1:55229", "session name": "", "session id": ""}
2020-09-23T15:00:16.772+0800	DEBUG	tcp/session.go:196	send ping msg success	{"remote": "127.0.0.1:55229", "session name": "", "session id": ""}

Debugger finished with exit code 0

正常情况应该是 "remote": "127.0.0.1:55229"和"remote": "127.0.0.1:55230" 交叉出现,但是现在是成堆出现,间隔时间应该是2s.时间出现得也很奇怪

FJSDS commented

@RussellLuo 空了看一下,我感觉是同一毫毛内不能有多个任务,从目前观察来看,同一MS只会执行一个任务,剩下得任务会隔N个间隔时间之后批量出来