torch.GC may hang with a data loader cache

Question

torch.GC may hang with a data loader cache

Closed this issue 4 years ago · 0 comments

The current design requires all the tensors created between two consequent torch.GC() call being unreachable upon the second call. As a result, if we want to cache data loader tensors in a chan, the torch.GC() may hang forever.

For a simplified example:

	torch.GC()
	runtime.LockOSThread()
	c := make(chan torch.Tensor, 0)
	{
		torch.NewTensor([][]float32{{1, 2}, {3, 4}}) // Register the anonymous `Tensor` to the global `WaitGroup`
		go func() {
			a := torch.NewTensor([][]float32{{1, 2}, {3, 4}}) // Register `a` to the global `WaitGroup`
			c <- a
			time.Sleep(time.Day)
			runtime.KeepAlive(&a)
		}()
	}
	<-c
	torch.GC() // Lasts for one day

The 2nd call to torch.GC() in the above code snippet will last for a day because a in the goroutine will live until the end of the goroutine.