arc-research-lab/CHARM

Some questions about the paper and code of CHARM

Closed this issue · 3 comments

Thank you for open-sourcing the CHARM project. This is an excellent GEMM accelerator work with outstanding performance. While we read the CHARM paper and code, we have a few questions and would really appreciate it if you can answer them.

Firstly, some questions about the paper:

  1. In the last paragraph of section 4.2, it says "... so that a tile of LHS with size (X x A x TI) x (Y x B x TK) can be reused on-chip for (Z x TJ) times", here why it is (Z x TJ) times not (Z x C x TJ) times?
  2. In the "1st Step: Workload Assignment" portion of section 5.4, "... mapping an application with n kernels to num accs suffers...", here mentions the number of accelerators at the first time, is this a user provided input parameter, or the output from CDSE?
  3. Also, in 1st step, it's better to give more explanation about how to reduce the time complexity as C(n-1, num-1). How does this function come?

Then, as for the code from the GitHub:

  1. In the input.cfg file, KRL_TYPE can only be 0 or 1, however, in src_gen/AIE_ArrGen/gen_graph.sh, line 89 and src_gen/Kernel_Gen/gen_grah.sh, line 73, it will check if kernel_type == "int32". If kernel type can only be either 1 or 0, why need to check if it is equal to int32?
  2. After running code generation, what is the purpose of these three files: mm_graph_x3_type1.h, mm_graph_x3_type0.h, mm_graph_x3_col.h? They are not included in the top function or anywhere else.
  3. Looks like there is no data set for testing, could you provide some data set for a demo?
  4. Our platform is VCK5000, however, when we compile the project by following the instructions, we will get some errors. We'd like to double-check that if we need more specific modifications or instructions to run the project on VCK5000?
  5. After searching in the CHARM repo, we only found the CACG (code generation) part of CHARM framework, but not the design search parts (e.g., CDSE, CDAC and CRTS). Could you please point us to the location of the source code for these parts? Additionally, how could we get the parameters in input.cfg? How to generate different accs for different MM? How to generate accs for non-MM functions? For example, in the example/BERT, there are files for different sizes, like mm_graph_large.h, mm_graph_small.h, dma_large.h, but according to the sources in src_gen, there is no sh file to generate a file which ends with _large.h or _small.h. Would be great if you can shed some light on these parts.

Thank you very much in advance and looking forward to your reply!

Thank you for your feedbacks!

For the questions about the paper:

  1. Actually this is a typo and it should be (Z x C x TJ) times. Thank you for pointing this out.
  2. The number of the accelerator is a hyper parameter. CDAC will automatically explore the different settings.
  3. First, in order to make us on the same page, C(n-1, num-1) means choose (num-1) from (n-1). There are "n" kernels need to be mapped to "num" accelerators, thus we insert (num-1) block to the (n-1) empty space between the kernels which equals to choose (num-1) from (n-1).

Thank you for your reply!

One more question here, for #3, I understood that this is mapping "n" kernels to "num" accs so that it's choosing (num-1) from (n-1). But this assumes that those "n" kernels are indistinguishable from each other. What if they are distinguishable? Based on my understanding, mapping "n" different kernels to "num" kernels is num!S(n, num) instead of C(n-1, num-1). So my question is, why do we need to assume that each kernel is indistinguishable?

And just a kindly reminder, if it is possible, we still look forward to your clarification on the second part of our question (about the code on Github).

Thank you again for your patience!

@hongzhengTian @JinmingZhuang Regarding the code, we have some manuscripts that are still being reviewed. We will let you know when this part is released.