pytorch/pytorch

option to save to disk when calling save_for_backward()

mattorourke17 opened this issue ยท 6 comments

๐Ÿš€ Feature

When calling save_for_backward inside a built-in or custom torch.autograd.Function, it would be nice to have the option to store these tensors on disk rather than in memory.

Motivation

Some operations, such as matrix decompositions, require the storage of multiple intermediate tensors using the save_for_backward(*tensors) method each time they are called. In applications where these operations are called many times (e.g. tensor networks), and the matrix sizes are medium to large, the memory consumption of autograd can quickly become overbearing. In some cases, simply saving/loading these intermediate tensors to/from disk could be an ideal solution. If the forward() call of each decomposition is very time consuming, then the additional computation time incurred by using checkpointing techniques (to reduce memory usage in an alternative way) could far exceed the time it takes to simply load the tensors from disk.

Pitch

Adding a kwarg to save_for_backward that allows for disk storage would permit custom Functions to use this feature. Exposing this option to the user for built-in Functions seems like it might require a bit more careful design consideration. Ideally, one would be able to specify which function calls use disk storage and which use memory, so that backward() for operations on small tensors can still be evaluated efficiently with no I/O time.

cc @ezyang @ssnl @albanD @zou3519 @gqchen

Some quick notes:

  1. This wouldn't work with double backwards, as we can't conveniently serialize the autograd graph itself
  2. Here's what we save for backwards:
  at::Tensor data_;

  // The gradient function associated with this node. If has_grad_fn
  // is false, then this is a leaf node. Note that the grad_fn is not saved if
  // it would create a circular reference. In that case, the grad_fn must be
  // passed in to the unpack function when reconstructing the Variable.
  std::shared_ptr<Node> grad_fn_;
  // Weak version of grad_fn_ that prevents leaks in rebase_history() for
  // inplace views.
  std::weak_ptr<Node> weak_grad_fn_;
  std::weak_ptr<Node> grad_accumulator_;
  c10::VariableVersion version_counter_;

  uint32_t saved_version_ = 0;
  uint32_t output_nr_ = 0;
  bool was_default_constructed_ = true;
  bool requires_grad_ = false;
  bool has_grad_fn_ = false;
  bool is_inplace_view_ = false;

So except for the Node bits and the version counters, it does sound possible to save this.

@ezyang we don't really need to save all these fields. The problem if I understand it properly is just that the data_ is too big.
We could make SavedVariables where data_ is either a Tensor or a Tensor serialized to a file.

Also if we keep the rest of the struct intact, it won't be a problem for double backward right?

Ah yes, I think that would probably work.

it sounds like we need to come up with an API for this.

for custom functions, there is a fairly simple workaround (write/read from disk yourself). That won't work with double backwards, but that's fairly uncommon.

For now, we'll label low priority.

Closing this as it is now possible using saved tensor hooks!

See: #62362
And: pytorch/tutorials#1655