intel/neural-speed

Add support for phi-3-mini-128k model

Closed this issue · 4 comments

Please add support for the phi-3-mini-128k(context length) model in neural-speed.

It's in our plan.
Thanks

Thanks, since phi 3 support has been merged will close this issue. But I have another question and do not want to create a separate issue, so asking here.

According to https://github.com/intel/neural-speed/tree/main/neural_speed/core#fastest-configuration-for-cpus , for ISAs both newer and older(AVX2) than AVX512F, int8 is the fastest configuration, but for AVX512F fp32 is the fastest. Why is it so? Also, does int8 compute lead to lesser memory usage as compared to fp32 or is the memory usage equal for same type of quantization?

@bil-ash Hi, AVX512F here means devices without AVX512_VNNI, and I don't implement u8s8 and s8s8 for AVX512. So it's better to use fp32 for computation. AVX2 devices without AVX_VNNI have u8s8 & s8s8 kernels for backup.

Okay, understood