[BUG] 存下显存泄漏，及访问过快时候报错

Question

[BUG] 存下显存泄漏，及访问过快时候报错

changleilei opened this issue 2 years ago · 1 comments

Describe the bug

how to load CPM1 model form local, now i used the following way：
1、build my model
model = GPT2Model(num_layers=args.num_layers,
vocab_size=args.vocab_size,
hidden_size=args.hidden_size,
num_attention_heads=args.num_attention_heads,
embedding_dropout_prob=args.hidden_dropout,
attention_dropout_prob=args.attention_dropout,
output_dropout_prob=args.hidden_dropout,
max_sequence_length=args.max_position_embeddings,
checkpoint_activations=args.checkpoint_activations,
checkpoint_num_layers=args.checkpoint_num_layers,
parallel_output=args.parallel_output)

the code from here
2、load_state_dict
load state_dict form local model

3、use wrapper to use bminf
model = bminf.wrapper(model)

Expected behavior

Screenshots

请求之前的显存占用

请求之后的显存占用

在访问速度过快的时候，也会报错。

其他：
怎么wrapper 一个transformers中加载出的模型？示例中实现没看明白。
Environment:

apex 0.1
bminf 2.0.0
deepspeed 0.3.15

Answer 1 · 2022-10-18T06:12:54.000Z

图中的问题应该不是显存泄露，pytorch自带的内存管理模块会在显存有空余时占用额外的显存以提升运行效率，所以虽然图中显示的显存占用增加了，但实际上只是pytorch没有把空闲的空间释放出来。
BMInf不支持并发调用。