ray-project/llm-applications

All cluster resources being claimed by actors ?

Chuukwudi opened this issue · 0 comments

On the notebook, calling

# Embed chunks
embedding_model_name = "thenlper/gte-base"
embedded_chunks = chunks_ds.map_batches(
    EmbedChunks,
    fn_constructor_kwargs={"model_name": embedding_model_name},
    batch_size=100, 
    num_gpus=1,
    compute=ActorPoolStrategy(size=2))

# Sample
sample = embedded_chunks.take(1)

results to:

======== Autoscaler status: 2023-09-19 10:15:05.945390 ========
Node status
---------------------------------------------------------------
Healthy:
 1 node_39e554d28e4f63b9d3360ffdf267014a901a29d1601c039967717f26
Pending:
 (no pending nodes)
Recent failures:
 (no failures)

Resources
---------------------------------------------------------------
Usage:
 1.0/32.0 CPU
 1.0/1.0 GPU
 0B/10.09GiB memory
 11.70MiB/5.05GiB object_store_memory

Demands:
 {'CPU': 1.0, 'GPU': 1.0}: 1+ pending tasks/actors
(autoscaler +2m17s) Warning: The following resource request cannot be scheduled right now: {'CPU': 1.0, 'GPU': 1.0}. This is likely due to all cluster resources being claimed by actors. Consider creating fewer actors or adding more nodes to this Ray cluster.
(autoscaler +2m52s) Warning: The following resource request cannot be scheduled right now: {'CPU': 1.0, 'GPU': 1.0}. This is likely due to all cluster resources being claimed by actors. Consider creating fewer actors or adding more nodes to this Ray cluster.

Any solution ?
I have tried changing ActorPoolStrategy to size 1 and reducing batch_size yet the same old story.