vad FSMN语音端点检测-中文-通用-16k 内存泄漏问题

Question

vad FSMN语音端点检测-中文-通用-16k 内存泄漏问题

Closed this issue 2 months ago · 0 comments

Notice: In order to resolve issues more efficiently, please raise issue following the template.
（注意：为了更加高效率解决您遇到的问题，请按照模板提问，补充细节）

🐛 Bug

FSMN语音端点检测-中文-通用-16k（https://modelscope.cn/models/iic/speech_fsmn_vad_zh-cn-16k-common-pytorch/），
这个版本在处理流式音频mic输入的时候，cache会一直增长。进一步查看，发现是cache['stats'].decibel一直在增长。从源码看，self.decibel 被定义为 []，其大小无限制，新数据会一直被压入导致最终内存泄漏。

To Reproduce

`from funasr import AutoModel
import pyaudio
from pympler import asizeof

chunk_size = 600 # ms
model = AutoModel(model="fsmn-vad", model_revision="v2.0.4")
RATE = 16000
chunk_stride = int(chunk_size * RATE / 1000)
FORMAT = pyaudio.paInt16 # 16-bit format
CHANNELS = 1 # Mono

p = pyaudio.PyAudio()
stream = p.open(format=FORMAT, channels=CHANNELS, rate=RATE, input=True,
frames_per_buffer=chunk_stride)

cache = {}
while True:
speech_chunk = stream.read(chunk_stride)
res = model.generate(input=speech_chunk, cache=cache, is_final=False, chunk_size=chunk_size)
if len(res[0]["value"]):
print(res)
print(f"stats:{asizeof.asizeof(cache['stats'])},"
f"stats.data_buf:{asizeof.asizeof(cache['stats'].data_buf)},"
f"stats.data_buf_all:{asizeof.asizeof(cache['stats'].data_buf_all)},"
f"stats.decibel:{asizeof.asizeof(cache['stats'].decibel)}")
`

rtf_avg: 0.014: 100%|██████████| 1/1 [00:00<00:00, 113.59it/s] stats:30088,stats.data_buf:184,stats.data_buf_all:184,stats.decibel:25384 rtf_avg: 0.018: 100%|██████████| 1/1 [00:00<00:00, 83.16it/s] stats:32504,stats.data_buf:184,stats.data_buf_all:184,stats.decibel:27800 rtf_avg: 0.016: 100%|██████████| 1/1 [00:00<00:00, 100.11it/s] stats:34040,stats.data_buf:184,stats.data_buf_all:184,stats.decibel:29336 rtf_avg: 0.018: 100%|██████████| 1/1 [00:00<00:00, 88.05it/s] stats:36568,stats.data_buf:184,stats.data_buf_all:184,stats.decibel:31864

Environment

OS (e.g., Linux): WIN
FunASR Version (e.g., 1.0.0): 1.1.6