Single GPU bug with python -m chameleon.miniviewer

Question

Single GPU bug with python -m chameleon.miniviewer

MaureenZOU opened this issue 5 months ago · 6 comments

One trick for the follow up with one gpu is adding the following two lines in chameleon.py, otherwise it would hang forever.

if world_size > 1:
    dist.broadcast_object_list(to_continue, src=0)

if world_size > 1:
    dist.broadcast_object_list(req, src=0)

Answer 1 · 2024-07-22T18:13:16.000Z

Hi,
when using the single gpu , I still got hangup even adding these two lines on inferencing by 30b or 7b.

On loading the model, the returning value of world_size == 0 and no worker was started.

Such a mysterious code of this project :(

Answer 2 · 2024-07-22T18:16:46.000Z

Interesting, may check the cuda env, my problem fixed by adding those two lines.

Answer 3 · 2024-07-22T18:21:45.000Z

Interesting, may check the cuda env, my problem fixed by adding those two lines.

I hard-code the world-size to 1, then got the error:
File "/data/app/ai/chameleon/chameleon/inference/loader.py", line 52, in load_model with open(src_dir / "params.json", "r") as f: FileNotFoundError: [Errno 2] No such file or directory: '/data/app/ai/models/chameleon/models/30b/params.json'

:(

Answer 4 · 2024-07-22T18:24:20.000Z

This seems that you haven't download the checkpoint correctly. Could you please check the folder?

Answer 5 · 2024-07-22T18:24:55.000Z

BTW, I am running the miniviewer.

Answer 6 · 2024-07-22T18:32:05.000Z

BTW, I am running the miniviewer.

thanks , I missed consolidated.pth . But its size is also huge, oops .