ray-project/ray

[Data] Ray Data continues autoscaling even when pipeline is backpressured by iteration

Opened this issue · 0 comments

What happened + What you expected to happen

I'm doing training, and my compute config looks like this:
image

My cluster autoscales CPU nodes and eventually GPU nodes to process more data, even though my trainer doesn't need more data.

Versions / Dependencies

2.21

Reproduction script

import ray
import numpy as np
import time

def generate_block(row):
    return {"data": np.zeros((128 * 1024 * 1024,), dtype=np.uint8)}


ds = ray.data.range(1000, override_num_blocks=1000).map(generate_block)
for block in ds.iter_batches(batch_size=None):
    time.sleep(5)
image image

Issue Severity

Medium: It is a significant difficulty but I can work around it.