temporalio/sdk-go

Batched heartbeat got canceled due to caller context cancellation

Opened this issue · 1 comments

Expected Behavior

Say I have the following activity code

func (a *Activities) MyActivity(ctx) (out, error)(
  g, gctx := errgroup.WithContext(ctx)
  g.SetLimit(4)
  for _, batch := range batches {
    batch := batch
    g.Go(func() error {
      activity.RecordHeartbeat(gctx)
      // Goroutine logic
    })
  }
  if err := g.Wait(); err != nil {
    return nil, err
  }

  for _, item := range array {
    activity.RecordHeartbeat(ctx)
    // Additional logic
  }
}

I expect heartbeat to get sent out to the server and my activity to NOT time out

Actual Behavior

Activity timed out with activity Heartbeat timeout

This is because g.Wait() cancels gctx but gctx is still referenced in SDK Heartbeat function's goroutine https://github.com/temporalio/sdk-go/blob/master/internal/internal_task_handlers.go#L2031. Due to the batch heartbeat logic, gctx was used after g.Wait() had finished in my code and caused a context canceled error when making the call to the server

I could change activity.RecordHeartbeat(gctx) to activity.RecordHeartbeat(ctx) but wonder if SDK can avoid this pitfall by using context.Background()

Steps to Reproduce the Problem

Specifications

  • Version: go.temporal.io/sdk v1.27.0
  • Platform: Linux

Talked offline discussed we should:

  • Clarify the documentation around how RecordHeartbeat will behave if the calling context is cancelled
  • If RecordHeartbeat is called with multiple contexts we should use the last non cancelled context.