modelscope/FunASR

如何在实际应用中提升模型效率?

Opened this issue · 3 comments

mzgcz commented

Notice: In order to resolve issues more efficiently, please raise issue following the template.
(注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)

❓ Questions and Help

在实际应用中要怎样提升在线模型(Streaming)的效率呢?
语言模型可以通过batch size进行批量推理,来提升推理效率;可以使用多实例来应对推理请求并发的情况;可以使用TensorRT来优化推理速度。
请问对于FunASR在线模型,上面哪些措施是可行的,有没有更好的推荐?

Before asking:

  1. search the issues.
  2. search the docs.

What is your question?

Code

What have you tried?

What's your environment?

  • OS (e.g., Linux):
  • FunASR Version (e.g., 1.0.0):
  • ModelScope Version (e.g., 1.11.0):
  • PyTorch Version (e.g., 2.0.0):
  • How you installed funasr (pip, source):
  • Python version:
  • GPU (e.g., V100M32)
  • CUDA/cuDNN version (e.g., cuda11.7):
  • Docker version (e.g., funasr-runtime-sdk-cpu-0.4.1)
  • Any other relevant information:

batch_size在这个funasr_wss_client_queue.py就是chunk_size,直接调就行

batch_size在这个funasr_wss_client_queue.py就是chunk_size,直接调就行

为什么?chunk-size不是[0,10,5]吗?第一维度是batch_size?