maitrix-org/llm-reasoners

TypeError: Too few parameters for <class 'reasoners.base.WorldModel'>; actual 2, expected 3

kingdomad opened this issue · 7 comments

  File "/home/dev/PycharmProjects/llm-reasoners/examples/rap_gsm8k/world_model.py", line 29, in <module>
    class GSM8kWorldModel(WorldModel[GSM8kState, GSM8kAction]):
  File "/home/dev/anaconda3/envs/llm/lib/python3.10/typing.py", line 312, in inner
    return func(*args, **kwds)
  File "/home/dev/anaconda3/envs/llm/lib/python3.10/typing.py", line 1345, in __class_getitem__
    _check_generic(cls, params, len(cls.__parameters__))
  File "/home/dev/anaconda3/envs/llm/lib/python3.10/site-packages/typing_extensions.py", line 165, in _check_generic
    raise TypeError(f"Too {'many' if alen > elen else 'few'} parameters for {cls};"
TypeError: Too few parameters for <class 'reasoners.base.WorldModel'>; actual 2, expected 3
Ber666 commented

Hi, we fixed this bug in the last update. Could you try running the latest code? Thanks!

It is ok now.

Thanks for the amazing repo.

When I execute the command:

--base_lm llama-2 --llama_2_ckpts '/mlsteam/data/LLM/model_cache/models--taide--b.11.0.0/snapshots/4d6209213acade66360d4972cccd7de9674fe6ac' --n_iters 1 --temperature 0.0 --depth_limit 5 --n_confidence 1 --n_action 1 | tee least-to-most.log

I have same erro:

Traceback (most recent call last):
  File "/mlsteam/lab/llm-reasoners/examples/rap_gsm8k/inference.py", line 15, in <module>
    def node_visualizer(x: MCTSNode[GSM8kState, GSM8kAction]):
  File "/usr/lib/python3.10/typing.py", line 312, in inner
    return func(*args, **kwds)
  File "/usr/lib/python3.10/typing.py", line 1345, in __class_getitem__
    _check_generic(cls, params, len(cls.__parameters__))
  File "/usr/local/lib/python3.10/dist-packages/typing_extensions.py", line 167, in _check_generic
    raise TypeError(f"Too {'many' if alen > elen else 'few'} parameters for {cls};"
TypeError: Too few parameters for <class 'reasoners.algorithm.mcts.MCTSNode'>; actual 2, expected 3

Thanks for the amazing repo.

When I execute the command:

--base_lm llama-2 --llama_2_ckpts '/mlsteam/data/LLM/model_cache/models--taide--b.11.0.0/snapshots/4d6209213acade66360d4972cccd7de9674fe6ac' --n_iters 1 --temperature 0.0 --depth_limit 5 --n_confidence 1 --n_action 1 | tee least-to-most.log

I have same erro:

Traceback (most recent call last):
  File "/mlsteam/lab/llm-reasoners/examples/rap_gsm8k/inference.py", line 15, in <module>
    def node_visualizer(x: MCTSNode[GSM8kState, GSM8kAction]):
  File "/usr/lib/python3.10/typing.py", line 312, in inner
    return func(*args, **kwds)
  File "/usr/lib/python3.10/typing.py", line 1345, in __class_getitem__
    _check_generic(cls, params, len(cls.__parameters__))
  File "/usr/local/lib/python3.10/dist-packages/typing_extensions.py", line 167, in _check_generic
    raise TypeError(f"Too {'many' if alen > elen else 'few'} parameters for {cls};"
TypeError: Too few parameters for <class 'reasoners.algorithm.mcts.MCTSNode'>; actual 2, expected 3

Hi, thanks for letting us know the problem. We just updated the code. Could you try it?

These type parameter don't really matter, so if you encounter similar errors in the future, simply deleting them should work.

Thanks for your reply.
I downloaded the new repository, and tried to run the GSM8k example again

CUDA_VISIBLE_DEVICES=0 torchrun --nproc-per-node 2 --master-port 6676 inference.py --base_lm llama-2 --llama_2_ckpts models/ --llama_size 13B

The original bug has been fixed, but I seem to have encountered a new problem

[2024-03-14 16:19:11,917] torch.distributed.run: [WARNING] 
[2024-03-14 16:19:11,917] torch.distributed.run: [WARNING] *****************************************
[2024-03-14 16:19:11,917] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
[2024-03-14 16:19:11,917] torch.distributed.run: [WARNING] *****************************************
Traceback (most recent call last):
  File "/mlsteam/data/LLM/llm-reasoners/inference.py", line 149, in <module>
    fire.Fire(main)
  File "/root/anaconda3/envs/reasoners/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/root/anaconda3/envs/reasoners/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/root/anaconda3/envs/reasoners/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/mlsteam/data/LLM/llm-reasoners/inference.py", line 129, in main
    base_model = Llama2Model(llama_2_ckpts, llama_size, max_batch_size=batch_size)
  File "/mlsteam/data/LLM/llm-reasoners/reasoners/lm/llama_2_model.py", line 79, in __init__
    self.model, self.tokenizer = self.build(os.path.join(path, f"llama-2-{size.lower()}"), os.path.join(path, "tokenizer.model"),
  File "/mlsteam/data/LLM/llm-reasoners/reasoners/lm/llama_2_model.py", line 42, in build
    torch.cuda.set_device(local_rank)
  File "/root/anaconda3/envs/reasoners/lib/python3.10/site-packages/torch/cuda/__init__.py", line 408, in set_device
    torch._C._cuda_setDevice(device)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

models/llama-2-13b
models/tokenizer.model
> initializing model parallel with size 2
> initializing ddp with size 1
> initializing pipeline with size 1
[2024-03-14 16:19:21,933] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 8725 closing signal SIGTERM
[2024-03-14 16:19:22,098] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 1 (pid: 8726) of binary: /root/anaconda3/envs/reasoners/bin/python
Traceback (most recent call last):
  File "/root/anaconda3/envs/reasoners/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/root/anaconda3/envs/reasoners/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper
    return f(*args, **kwargs)
  File "/root/anaconda3/envs/reasoners/lib/python3.10/site-packages/torch/distributed/run.py", line 812, in main
    run(args)
  File "/root/anaconda3/envs/reasoners/lib/python3.10/site-packages/torch/distributed/run.py", line 803, in run
    elastic_launch(
  File "/root/anaconda3/envs/reasoners/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 135, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/root/anaconda3/envs/reasoners/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 268, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
inference.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-03-14_16:19:21
  host      : 2b3f447818a7
  rank      : 1 (local_rank: 1)
  exitcode  : 1 (pid: 8726)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

Hi, it's hard to say what's the problem with this error info, since it happens at the very beginning of initialization. Maybe you want to first try whether you can use meta's llama repo to run some inference?

Thanks for your advice, I successfully executed meta's llama repo.

However, when I run this command:

torchrun --nproc-per-node 1 --master-port 6676 [inference.py](http://inference.py/) --base_lm llama-2 --llama_2_ckpts /mlsteam/data/LLM/llama/ --llama_size 7B

I still get the following error.

gsm8k:   0%|                                                                                | 0/1319 [00:00<?, ?it/s/mlsteam/data/LLM/llm-reasoners/reasoners/lm/llama_2_model.py:120: UserWarning: the eos_token '\n' is encoded into [29871, 13] with length != 1, using 13 as the eos_token_id
  warnings.warn(f'the eos_token {repr(token)} is encoded into {tokenized} with length != 1, '
gsm8k:   0%|                                                                                | 0/1319 [04:34<?, ?it/s]
Traceback (most recent call last):                                                                                   
  File "/mlsteam/data/LLM/llm-reasoners/inference.py", line 150, in <module>
    fire.Fire(main)
  File "/mlsteam/data/LLM/llama/venv/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/mlsteam/data/LLM/llama/venv/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/mlsteam/data/LLM/llama/venv/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/mlsteam/data/LLM/llm-reasoners/inference.py", line 141, in main
    rap_gsm8k(base_model=base_model,
  File "/mlsteam/data/LLM/llm-reasoners/inference.py", line 69, in rap_gsm8k
    accuracy = evaluator.evaluate(reasoner, num_shot=4, resume=resume, log_dir=log_dir)
  File "/mlsteam/data/LLM/llm-reasoners/reasoners/base.py", line 211, in evaluate
    algo_output = reasoner(self.input_processor(example),
  File "/mlsteam/data/LLM/llm-reasoners/reasoners/base.py", line 160, in __call__
    return self.search_algo(self.world_model, self.search_config, **kwargs)
  File "/mlsteam/data/LLM/llm-reasoners/reasoners/algorithm/mcts.py", line 332, in __call__
    aggregated_result=self.aggregator(result.tree_state),
  File "/mlsteam/data/LLM/llm-reasoners/reasoners/algorithm/mcts.py", line 83, in __call__
    def visit(cur: MCTSNode[State, Action]):
  File "/usr/lib/python3.10/typing.py", line 312, in inner
    return func(*args, **kwds)
  File "/usr/lib/python3.10/typing.py", line 1345, in __class_getitem__
    _check_generic(cls, params, len(cls.__parameters__))
  File "/mlsteam/data/LLM/llama/venv/lib/python3.10/site-packages/typing_extensions.py", line 167, in _check_generic
    raise TypeError(f"Too {'many' if alen > elen else 'few'} parameters for {cls};"
TypeError: Too few parameters for <class 'reasoners.algorithm.mcts.MCTSNode'>; actual 2, expected 3
[2024-03-19 09:07:15,386] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 1654) of binary: /mlsteam/data/LLM/llama/venv/bin/python
Traceback (most recent call last):
  File "/mlsteam/data/LLM/llama/venv/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/mlsteam/data/LLM/llama/venv/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper
    return f(*args, **kwargs)
  File "/mlsteam/data/LLM/llama/venv/lib/python3.10/site-packages/torch/distributed/run.py", line 812, in main
    run(args)
  File "/mlsteam/data/LLM/llama/venv/lib/python3.10/site-packages/torch/distributed/run.py", line 803, in run
    elastic_launch(
  File "/mlsteam/data/LLM/llama/venv/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 135, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/mlsteam/data/LLM/llama/venv/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 268, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
inference.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-03-19_09:07:15
  host      : 4dfed43308fb
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 1654)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================