Intro

UPMEM_GPT is a implementation of GPT2-XL model in UPMEM system. GPT2-XL is a model with 1.6B parameters. More infomation can be found in the offical github page here: https://github.com/openai/gpt-2

Pre-requirment

You need to change the file path in "/UPMEM_GPT2/define.h"" into your specific settings---please refer to the figure below.(Default path is "/home/kaist_icn/wuxiangyu/upload/GPT2/data" )

image

Usage

To use it, please config parameters in file "/UPMEM_GPT2/common.h"."common.h" is a file that contains the model information and also your input & output sequence length.

User can freely configure the input tokens and also the number of output tokens they want to generate, by changing the number of sequence_length and generation_length in the file.

image

Also, user can decide how many DPUs used to process each head in the multihead attention block by configuring the number of NR_DPUS.

image

After configing all the parameters,use ./result.sh in the terminal to try with synthetic input;

For now, even through the implementation are provided with trained model weights, the input tokens are generated randomly. The part that mapping words into numerical token is not provided yet.

Results

GPT2 latency will be automatically print to a file result, just like the figure below shows: CPU-DPU indicates the latency of transfer input data from CPU to DPUs. kernel indicates DPU program running latency and DPU-CPU indicates the latency of get results back to CPU. Detailed latency breakdown will also being in the file, including the time for summarization stage and generation, as well as the time for attention block and feed forward block.

image

For your customization

If you want to try with another model, you should change the model configuration in common.h file accordingly.

Contact

Please email me at wuxiangyu@kaist.ac.kr if you have any problem using it.