yrlu/Teaism

Shared activation diff

jyhjinghwang opened this issue · 0 comments

To implement a shared data holder for gradients of activations for efficient GPU memory usage.