如何在实际应用中提升模型效率?
Opened this issue · 3 comments
mzgcz commented
Notice: In order to resolve issues more efficiently, please raise issue following the template.
(注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)
❓ Questions and Help
在实际应用中要怎样提升在线模型(Streaming)的效率呢?
语言模型可以通过batch size进行批量推理,来提升推理效率;可以使用多实例来应对推理请求并发的情况;可以使用TensorRT来优化推理速度。
请问对于FunASR在线模型,上面哪些措施是可行的,有没有更好的推荐?
Before asking:
- search the issues.
- search the docs.
What is your question?
Code
What have you tried?
What's your environment?
- OS (e.g., Linux):
- FunASR Version (e.g., 1.0.0):
- ModelScope Version (e.g., 1.11.0):
- PyTorch Version (e.g., 2.0.0):
- How you installed funasr (
pip
, source): - Python version:
- GPU (e.g., V100M32)
- CUDA/cuDNN version (e.g., cuda11.7):
- Docker version (e.g., funasr-runtime-sdk-cpu-0.4.1)
- Any other relevant information:
AliceShen122 commented
+1
hefeng-0411 commented
batch_size在这个funasr_wss_client_queue.py就是chunk_size,直接调就行
AliceShen122 commented
batch_size在这个funasr_wss_client_queue.py就是chunk_size,直接调就行
为什么?chunk-size不是[0,10,5]吗?第一维度是batch_size?