modelscope/FunASR

命令行执行错误

Closed this issue · 3 comments

Notice: In order to resolve issues more efficiently, please raise issue following the template.
(注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)

🐛 Bug

To Reproduce

Steps to reproduce the behavior (always include the command you ran):

  1. Run cmd 'funasr ++model=paraformer-zh ++vad_model="fsmn-vad" ++punc_model="ct-punc" ++input=/tmp/SoundLip.wav'
  2. See error
...
  File "/Users/mars/jobs/blue-associator/venv/lib/python3.12/site-packages/funasr/auto/auto_model.py", line 305, in inference
    res = model.inference(**batch, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mars/jobs/blue-associator/venv/lib/python3.12/site-packages/funasr/models/fsmn_vad_streaming/model.py", line 690, in inference
    audio_sample = torch.cat((cache["prev_samples"], audio_sample_list[0]))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: expected Tensor as element 1 in argument 0, but got str
  0%|          | 0/1 [00:00<?, ?it/s]

Code sample

上例的命令行来自项目 readme 中的例子,使用了本地的一个wav录音

Expected behavior

Environment

  • OS (e.g., Linux): MacOS
  • FunASR Version (e.g., 1.0.0): 1.1.5
  • ModelScope Version (e.g., 1.11.0): 1.16.1
  • PyTorch Version (e.g., 2.0.0): 2.3.1
  • How you installed funasr (pip, source): pip
  • Python version: 3.12.0
  • GPU (e.g., V100M32) MPS
  • CUDA/cuDNN version (e.g., cuda11.7):
  • Docker version (e.g., funasr-runtime-sdk-cpu-0.4.1)
  • Any other relevant information:

Additional context

Please check you wav file.

Please check you wav file.

这个wav文件可以用以下的python脚本正常处理:

import sys

from funasr import AutoModel

model_dir = "iic/SenseVoiceSmall"

model = AutoModel(
    model=model_dir,
    vad_model="fsmn-vad"
)

res = model.generate(
    input=sys.argv[1],
    cache={},
    language="auto",  # "zn", "en", "yue", "ja", "ko", "nospeech"
    use_itn=True,
    batch_size_s=60,
    merge_vad=True,  #
    merge_length_s=15,
)

if len(sys.argv) >= 3:
    with open(sys.argv[2], "w+") as f:
        f.write(res[0]["text"])
else:
    print(res[0]["text"])

Same problem, any solution?

docker pull modelscope-registry.cn-hangzhou.cr.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.04-cuda12.1.0-py310-torch2.3.0-tf2.16.1-1.18.0

Then inside the container, run

funasr ++model=paraformer-zh ++vad_model="fsmn-vad" ++punc_model="ct-punc" ++input=asr_example_zh.wav

Throw error

funasr version: 1.1.6.
Check update of funasr, and it would cost few times. You may disable it by set `disable_update=True` in AutoModel
You are using the latest version of funasr-1.1.6
[2024-09-11 17:56:09,280][root][INFO] - download models from model hub: ms
2024-09-11 17:56:10,420 - modelscope - WARNING - Using branch: master as version is unstable, use with caution
[2024-09-11 17:56:13,379][root][INFO] - Loading pretrained params from /mnt/workspace/.cache/modelscope/hub/iic/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/model.pt
[2024-09-11 17:56:13,386][root][INFO] - ckpt: /mnt/workspace/.cache/modelscope/hub/iic/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/model.pt
[2024-09-11 17:56:13,926][root][INFO] - scope_map: ['module.', 'None']
[2024-09-11 17:56:13,926][root][INFO] - excludes: None
[2024-09-11 17:56:14,073][root][INFO] - Loading ckpt: /mnt/workspace/.cache/modelscope/hub/iic/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/model.pt, status: <All keys matched successfully>
[2024-09-11 17:56:14,085][root][INFO] - Building VAD model.
[2024-09-11 17:56:14,086][root][INFO] - download models from model hub: ms
2024-09-11 17:56:14,371 - modelscope - WARNING - Using branch: master as version is unstable, use with caution
[2024-09-11 17:56:14,754][root][INFO] - Loading pretrained params from /mnt/workspace/.cache/modelscope/hub/iic/speech_fsmn_vad_zh-cn-16k-common-pytorch/model.pt
[2024-09-11 17:56:14,755][root][INFO] - ckpt: /mnt/workspace/.cache/modelscope/hub/iic/speech_fsmn_vad_zh-cn-16k-common-pytorch/model.pt
[2024-09-11 17:56:14,759][root][INFO] - scope_map: ['module.', 'None']
[2024-09-11 17:56:14,759][root][INFO] - excludes: None
[2024-09-11 17:56:14,761][root][INFO] - Loading ckpt: /mnt/workspace/.cache/modelscope/hub/iic/speech_fsmn_vad_zh-cn-16k-common-pytorch/model.pt, status: <All keys matched successfully>
[2024-09-11 17:56:14,762][root][INFO] - Building punc model.
[2024-09-11 17:56:14,762][root][INFO] - download models from model hub: ms
2024-09-11 17:56:15,163 - modelscope - WARNING - Using branch: master as version is unstable, use with caution
Building prefix dict from the default dictionary ...
[2024-09-11 17:56:18,147][jieba][DEBUG] - Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
[2024-09-11 17:56:18,147][jieba][DEBUG] - Loading model from cache /tmp/jieba.cache
Loading model cost 0.806 seconds.
[2024-09-11 17:56:18,953][jieba][DEBUG] - Loading model cost 0.806 seconds.
Prefix dict has been built successfully.
[2024-09-11 17:56:18,953][jieba][DEBUG] - Prefix dict has been built successfully.
[2024-09-11 17:56:45,639][root][INFO] - Loading pretrained params from /mnt/workspace/.cache/modelscope/hub/iic/punc_ct-transformer_cn-en-common-vocab471067-large/model.pt
[2024-09-11 17:56:45,640][root][INFO] - ckpt: /mnt/workspace/.cache/modelscope/hub/iic/punc_ct-transformer_cn-en-common-vocab471067-large/model.pt
[2024-09-11 17:56:46,113][root][INFO] - scope_map: ['module.', 'None']
[2024-09-11 17:56:46,113][root][INFO] - excludes: None
[2024-09-11 17:56:46,246][root][INFO] - Loading ckpt: /mnt/workspace/.cache/modelscope/hub/iic/punc_ct-transformer_cn-en-common-vocab471067-large/model.pt, status: <All keys matched successfully>
  0%|                                                                                                                                                                                 | 0/1 [00:00<?, ?it/s]Error executing job with overrides: ['++model=paraformer-zh', '++vad_model=fsmn-vad', '++punc_model=ct-punc', '++input=asr_example_zh.wav']
Traceback (most recent call last):
  File "/usr/local/bin/funasr", line 8, in <module>
    sys.exit(main_hydra())
  File "/usr/local/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main
    _run_hydra(
  File "/usr/local/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra
    _run_app(
  File "/usr/local/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app
    run_and_report(
  File "/usr/local/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report
    raise ex
  File "/usr/local/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report
    return func()
  File "/usr/local/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in <lambda>
    lambda: hydra.run(
  File "/usr/local/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run
    _ = ret.return_value
  File "/usr/local/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value
    raise self._return_value
  File "/usr/local/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job
    ret.return_value = task_function(task_cfg)
  File "/usr/local/lib/python3.10/site-packages/funasr/bin/inference.py", line 25, in main_hydra
    res = model.generate(input=kwargs["input"])
  File "/usr/local/lib/python3.10/site-packages/funasr/auto/auto_model.py", line 263, in generate
    return self.inference_with_vad(input, input_len=input_len, **cfg)
  File "/usr/local/lib/python3.10/site-packages/funasr/auto/auto_model.py", line 336, in inference_with_vad
    res = self.inference(
  File "/usr/local/lib/python3.10/site-packages/funasr/auto/auto_model.py", line 302, in inference
    res = model.inference(**batch, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/funasr/models/fsmn_vad_streaming/model.py", line 690, in inference
    audio_sample = torch.cat((cache["prev_samples"], audio_sample_list[0]))
TypeError: expected Tensor as element 1 in argument 0, but got str
  0%|          | 0/1 [00:00<?, ?it/s]