AttributeError: 'DistributedArray' object has no attribute 'uuid'
chaokunyang opened this issue · 0 comments
chaokunyang commented
Please describe the bug
When call DistributedArray.block_until_ready
just like jax DeviceArray, alpa raise AttributeError:
2023-03-22 23:41:18,342 ERROR tensor_computing_v1.py:87 -- 'DistributedArray' object has no attribute 'uuid'
Traceback (most recent call last):
File "/home/admin/ray-pack/tmp/job/05000080/package/tensor_computing_v1.py", line 85, in main
alpa_compute()
File "/home/admin/ray-pack/tmp/job/05000080/package/tensor_computing_v1.py", line 60, in alpa_compute
results = timeit.repeat(
File "/home/admin/micromamba/envs/alpa/lib/python3.8/timeit.py", line 238, in repeat
return Timer(stmt, setup, timer, globals).repeat(repeat, number)
File "/home/admin/micromamba/envs/alpa/lib/python3.8/timeit.py", line 205, in repeat
t = self.timeit(number)
File "/home/admin/micromamba/envs/alpa/lib/python3.8/timeit.py", line 177, in timeit
timing = self.inner(it, self.timer)
File "<timeit-src>", line 6, in inner
File "/home/admin/ray-pack/tmp/job/05000080/package/tensor_computing_v1.py", line 56, in execute
result.block_until_ready()
File "/home/admin/ray-pack/tmp/job/05000080/pyenv/lib/python3.8/site-packages/alpa/device_mesh.py", line 1536, in block_until_ready
self.device_mesh.block_until_ready_remote_buffers(self.uuid)
AttributeError: 'DistributedArray' object has no attribute 'uuid'
Please describe the expected behavior
System information and environment
System information and environment
- OS Platform and Distribution: Linux REPL7 docker
- Python version: 3.8.16
- CUDA version: 11.2
- NCCL version: nccl-2.14.3.1
- cupy version: 11.6.0
- GPU model and memory: A10
- Alpa version: master
- TensorFlow version: not installed
- JAX version: jaxlib-0.3.22.cuda112.cudnn810-cp38-cp38
To Reproduce
Screenshots
Code snippet to reproduce the problem
def matrix_compute(t):
import jax.numpy as np
t_transposed = np.transpose(t)
dot_matrix = np.dot(t, t_transposed)
v1_row_norm = np.linalg.norm(t, axis=1).reshape(-1, 1)
v2_col_norm = np.linalg.norm(t_transposed, axis=0).reshape(1, -1)
norm_matrix = np.dot(v1_row_norm, v2_col_norm)
res = dot_matrix / norm_matrix
res = np.where(np.isneginf(res), 0, res)
return res
matrix_compute_jit = alpa.parallelize(matrix_compute)
result = matrix_compute_jit(vector_matrix)
result.block_until_ready()
Additional information
Add any other context about the problem here or include any logs that would be helpful to diagnose the problem.