ROCm/Tensile

tensile_client throwing std::out_of_range

elliottbinder opened this issue · 3 comments

I followed the quick example with a modified verson of the rocblas_sgemm_asm_only.yaml configuration, and after having modified BenchmarkProblems.py1 and SolutionsStructs.py2, I was able to get all of the expected directories as output.

rocblas_sgemm_asm_only.yaml.txt

But when I try to run the client, I get

./0_Build/client/tensile_client --problem-size=10240,10240,1024 --library-file=3_LibraryLogic/vega10_Cij_Aik_Bkj_SB.yaml --code-object=4_LibraryClient/library/Kernels.so-000-gfx900.hsaco --code-object=4_LibraryClient/library/TensileLibrary_gfx900.co --problem-identifier=Cij_Aik_Bkj
terminate called after throwing an instance of 'std::runtime_error'
  what():  Contraction identifier (Cij_Aik_Bkj) must start with 'Contraction_'.
Aborted (core dumped)

After modifying the problem identifier, I get

./0_Build/client/tensile_client --problem-size=10240,10240,1024 --library-file=3_LibraryLogic/vega10_Cij_Aik_Bkj_SB.yaml --code-object=4_LibraryClient/library/Kernels.so-000-gfx900.hsaco --code-object=4_LibraryClient/library/TensileLibrary_gfx900.co --problem-identifier=Contraction_Cij_Aik_Bkj
terminate called after throwing an instance of 'std::out_of_range'
  what():  basic_string::at: __n (which is 18446744073709551614) >= this->size() (which is 23)
Aborted (core dumped)

How can I use the kernel that was found during the tuning process?

Footnotes

  1. I wrapped benchmarkStep.benchmarkParameters into a list for the call to constructForkPermutations() around line 235.

  2. I commented out the checks on state["ProblemType"]["StridedBatched"] and others around line 1863 because state["ProblemType"] equaled Cij_Aik_Bkj_SB and was not a dictionary.

See my comment on #1282. Try again with JoinParameters in your config set to null (and revert the other changes you made to get to this point) and update on if this fixes the error.

As far as I can tell, removing JoinParameters from the config file fixes the two edits I mentioned in this issue (not the rocm-smi issue mentioned in #1282), but I'm still getting the out_of_range exception.

Ah, I missed your output file in #1282. Setting PinClocks (in the global parameters of the config file) to false should fix this.

Taking a closer look at your client invocation, library-file is not the library logic file generated by Tensile in 3_LibraryLogic, but the TensileLibrary.dat/yaml located in 4_LibraryClient/library. Did you create this tensile client invocation yourself, or is this taken from Tensile's output somewhere?

How can I use the kernel that was found during the tuning process?

If you simply wish to run the winning kernels, Tensile does this for you after building the client library. The invocation looks something like
./<pwd>/0_Build/client/tensile_client --config-file <pwd>/4_LibraryClient/source/ClientParameters_Cijk_Ailk_Bljk_SB.ini --best-solution 1

where pwd is just the path to the output directory---the second argument to Tensile.

I don't have access to a vega10 machine to test on, but the config you posted with the changes I suggested (PinClocks: false and removing JoinParameters) works on vega20 on the master branch for me.