NervanaSystems/maxas

"It is illegal to set a Read-After-Write dependency on a memory store op", on unmodified sass

Opened this issue · 12 comments

using microbench, with .cu modified to be simply:

extern "C" __global__ void  microbench(int *out, int *clocks, int *in)
{
  out[0] = 7.0f;
  out[2] = 5.0f;
  clocks[1] = 9.0f;
}

.cpp is modified slightly, since I'm on a 5.0, so I hacked microbench.cpp to accept this. It could be this is root cause for the issue in this issue?

        if (major >= 5 && minor >= 0)
        {

Then I do:

set -e

nvcc -l cuda -o microbench microbench.cpp
nvcc -arch sm_50 -cubin microbench.cu
maxas.pl -e microbench.cubin > microbench.sass

maxas.pl -i microbench.sass microbench.cubin
./microbench

... however it fails on the maxas.pl -i line with:

It is illegal to set a Read-After-Write dependency on a memory store op (store ops don't write to a register)
STG.E [R4], R6;

The .sass generated by maxas.pl -i looks like:

--:-:-:-:6      MOV R1, c[0x0][0x20];
--:-:-:-:1      MOV R0, param_1[0];
--:-:-:-:1      MOV R2, param_0[0];
--:-:-:-:4      MOV R3, param_0[1];
--:-:-:-:1      IADD32I R4.CC, R0, 0x4;
--:-:-:-:2      MOV32I R0, 0x7;
20:3:1:Y:7      IADD.X R5, RZ, param_1[1];
--:1:1:Y:4      STG.E [R4], R6;
--:-:-:Y:0      NOP;
--:-:-:Y:0      NOP;
--:-:-:Y:0      NOP;

Is this:

  • because I'm using 5.0 system, and maxas only supports 5.2?
  • a bug in maxas.pl -e (since I cant see the 9.0f constant anywhere in the sass?, I'm immediately suspicious?)
  • because I'm using not the latest versoin of maxas, which is eg in neon repo?
  • something I'm doing wrong (other than the cc5.0 thing) ?
  • something else?

Update: the sass generated by cuobjdump --dump-sass does include the 0x9 constant, looks fairly different:

    code for sm_50
        Function : microbench
    .headerflags    @"EF_CUDA_SM50 EF_CUDA_PTX_SM(EF_CUDA_SM50)"
                                                                           /* 0x001fc400fe2007f6 */
        /*0008*/                   MOV R1, c[0x0][0x20];                   /* 0x4c98078000870001 */
        /*0010*/                   MOV R0, c[0x0][0x148];                  /* 0x4c98078005270000 */
        /*0018*/                   MOV R2, c[0x0][0x140];                  /* 0x4c98078005070002 */
                                                                           /* 0x001fc800fe2007f4 */
        /*0028*/                   MOV R3, c[0x0][0x144];                  /* 0x4c98078005170003 */
        /*0030*/                   IADD32I R4.CC, R0, 0x4;                 /* 0x1c10000000470004 */
        /*0038*/                   MOV32I R0, 0x7;                         /* 0x010000000077f000 */
                                                                           /* 0x001fc0001e4007f0 */
        /*0048*/         {         MOV32I R7, 0x5;                         /* 0x010000000057f007 */
        /*0050*/                   STG.E [R2], R0;        }                /* 0xeedc200000070200 */
        /*0058*/         {         MOV32I R6, 0x9;                         /* 0x010000000097f006 */
        /*0068*/                   STG.E [R2+0x8], R7;        }            /* 0x0007c400fc4000f1 */
                                                                           /* 0xeedc200000870207 */
        /*0070*/                   IADD.X R5, RZ, c[0x0][0x14c];           /* 0x4c1008000537ff05 */
        /*0078*/                   STG.E [R4], R6;                         /* 0xeedc200000070406 */
                                                                           /* 0x001f8000ffe007ff */
        /*0088*/                   EXIT;                                   /* 0xe30000000007000f */
        /*0090*/                   BRA 0x90;                               /* 0xe2400fffff87000f */
        /*0098*/                   NOP;                                    /* 0x50b0000000070f00 */
                                                                           /* 0x001f8000fc0007e0 */
        /*00a8*/                   NOP;                                    /* 0x50b0000000070f00 */
        /*00b0*/                   NOP;                                    /* 0x50b0000000070f00 */
        /*00b8*/                   NOP;                                    /* 0x50b0000000070f00 */
        ...........................

I'm tentatively convinced it's something to do with the 5.0 thing?

Oh... I guess that the 5.0 code is interpreted as control codes, and being absorbed into 20:3:1:Y:7, rather than becoming:

        /*0048*/         {         MOV32I R7, 0x5;                         /* 0x010000000057f007 */
        /*0050*/                   STG.E [R2], R0;        }                /* 0xeedc200000070200 */
        /*0058*/         {         MOV32I R6, 0x9;  

Fair analysis? Surmountable? Only option is either to rewrite maxas to support 5.0 or get a 5.2 card?

(by the way, in case useful, I get the following warnings when running maxas:

Unescaped left brace in regex is deprecated, passed through in regex; marked by <-- HERE in m/ { <-- HERE (?<db5>5)?,?(?<db4>4)?,?(?<db3>3)?,?(?<db2>2)?,?(?<db1>1)?,?(?<db0>0)?}/ at /usr/local/share/perl/5.22.1/MaxAs/MaxAsGrammar.pm line 239.
Unescaped left brace in regex is deprecated, passed through in regex; marked by <-- HERE in m/^(?^:\@(?<predNot>\!)?P(?<predNum>[0-6]) )?DEPBAR(?^: { <-- HERE (?<db5>5)?,?(?<db4>4)?,?(?<db3>3)?,?(?<db2>2)?,?(?<db1>1)?,?(?<db0>0)?});/ at /usr/local/share/perl/5.22.1/MaxAs/MaxAsGrammar.pm line 275.

Maybe it is a perl versioning issue? I'm using:

$ perl --version

This is perl 5, version 22, subversion 1 (v5.22.1) built for x86_64-linux-gnu-thread-multi
(with 58 registered patches, see perl -V for more detail)

Seems maybe an issue with the regex on line 1521 of MaxAsGrammar.pm?

Hack this file to put some prints:

    print "psl check match\n";
    if ($line =~ m"^\s+/\*(?<num>[0-9a-f]+)\*/\s+$InstRe\s+/\* (?<code>0x[0-9a-f]+)"o)
    {
        print "   ... matched!\n";

Output:

psl check match
   ... matched!
   ctrl inst:
$VAR1 = {
          'code' => '5519169589765144579',
          'ins' => 'MOV R3, c[0x0][0x144];',
          'inst' => 'MOV R3, c[0x0][0x144];',
          'num' => 40,
          'op' => 'MOV',
          'pred' => undef
        };
ctrl 2033
  ctrl line         /*0030*/                   IADD32I R4.CC, R0, 0x4;                 /* 0x1c10000000470004 */

line before processSassLine         /*0030*/                   IADD32I R4.CC, R0, 0x4;                 /* 0x1c10000000470004 */

psl check match
   ... matched!
   ctrl inst:
$VAR1 = {
          'code' => '2022116232694005764',
          'ins' => 'IADD32I R4.CC, R0, 0x4;',
          'inst' => 'IADD32I R4.CC, R0, 0x4;',
          'num' => 48,
          'op' => 'IADD32I',
          'pred' => undef
        };
ctrl 2034
  ctrl line         /*0038*/                   MOV32I R0, 0x7;                         /* 0x010000000077f000 */

line before processSassLine         /*0038*/                   MOV32I R0, 0x7;                         /* 0x010000000077f000 */

psl check match
   ... matched!
   ctrl inst:
$VAR1 = {
          'code' => '72057594045788160',
          'ins' => 'MOV32I R0, 0x7;',
          'inst' => 'MOV32I R0, 0x7;',
          'num' => 56,
          'op' => 'MOV32I',
          'pred' => undef
        };
# line                                                                            /* 0x001fc0001e4007f0 */

ctrl 2032
  ctrl line         /*0048*/         {         MOV32I R7, 0x5;                         /* 0x010000000057f007 */

line before processSassLine         /*0048*/         {         MOV32I R7, 0x5;                         /* 0x010000000057f007 */

psl check match
   no match
ctrl 242
  ctrl line         /*0050*/                   STG.E [R2], R0;        }                /* 0xeedc200000070200 */

line before processSassLine         /*0050*/                   STG.E [R2], R0;        }                /* 0xeedc200000070200 */

psl check match
   no match
ctrl 2032
  ctrl line         /*0058*/         {         MOV32I R6, 0x9;                         /* 0x010000000097f006 */

Repeated displays of no match suggest the regex is failing for some reason?

Oh... I reckon it doesnt like the { and } signs in the cuobjdump output:

        /*0048*/         {         MOV32I R7, 0x5;                         /* 0x010000000057f007 */
        /*0050*/                   STG.E [R2], R0;        }                /* 0xeedc200000070200 */
        /*0058*/         {         MOV32I R6, 0x9;                         /* 0x010000000097f006 */
        /*0068*/                   STG.E [R2+0x8], R7;        }            /* 0x0007c400fc4000f1 */

Adding

            $line =~ s/{//g;
            $line =~ s/}//g;

to MaxAs.pl, just before my $inst = processSassLine($line) or next CTRL; makes the generated .sass now contain the missing instructions:

--:-:-:-:6      MOV R1, c[0x0][0x20];
--:-:-:-:1      MOV R0, param_1[0];
--:-:-:-:1      MOV R2, param_0[0];
--:-:-:-:4      MOV R3, param_0[1];
--:-:-:-:1      IADD32I R4.CC, R0, 0x4;
--:-:-:-:2      MOV32I R0, 0x7;
--:-:-:-:0      MOV32I R7, 0x5;
--:1:-:-:2      STG.E [R2], R0;
--:-:-:-:0      MOV32I R6, 0x9;
20:3:1:Y:7      IADD.X R5, RZ, param_1[1];
--:1:1:Y:4      STG.E [R4], R6;
--:-:-:Y:0      NOP;
--:-:-:Y:0      NOP;
--:-:-:Y:0      NOP;

... but same error about STG.E as before:

It is illegal to set a Read-After-Write dependency on a memory store op (store ops don't write to a register)
STG.E [R4], R6;

I'll submit a PR for the braces though

Mmmmm, this is interesting. There should be one control code followed by three operations right? But for the short example, with the parentheses in hte output, this rule seems to be broken temporarily:

cuobjdump --dump-sass output:

                                                                           /* 0x001fc0001e4007f0 */
        /*0048*/         {         MOV32I R7, 0x5;                         /* 0x010000000057f007 */
        /*0050*/                   STG.E [R2], R0;        }                /* 0xeedc200000070200 */
        /*0058*/         {         MOV32I R6, 0x9;                         /* 0x010000000097f006 */
        /*0068*/                   STG.E [R2+0x8], R7;        }            /* 0x0007c400fc4000f1 */
                                                                           /* 0xeedc200000870207 */
        /*0070*/                   IADD.X R5, RZ, c[0x0][0x14c];           /* 0x4c1008000537ff05 */
        /*0078*/                   STG.E [R4], R6;                         /* 0xeedc200000070406 */
                                                                           /* 0x001f8000ffe007ff */
        /*0088*/                   EXIT;                                   /* 0xe30000000007000f */
        /*0090*/                   BRA 0x90;                               /* 0xe2400fffff87000f */
        /*0098*/                   NOP;                                    /* 0x50b0000000070f00 */

4 contiguous non-control lines, hten a control, then only 2 control-lines. Maybe the parenthesis sort of moves around the contorl codes, so that the control code after the block of 4 actually applies retroactively to the previous line???

Hmmm... except that... 0xeedc200000870207 looks not like a control code:

$ python ../printcode.py 0xeedc200000870207
stall 7 thisyield 0 write 0 read 2 watdb 32
32:2:0:0:7
stall 4 thisyield 0 write 0 read 0 watdb 0
0:0:0:0:4
stall 8 thisyield 0 write 0 read 7 watdb 54
54:7:0:0:8

... but if we use the code from the previous line instead, ie 0x0007c400fc4000f1, that looks remarkably control-code like, and stays consistent with the 1:3 rule:

$ python ../printcode.py 0x0007c400fc4000f1
stall 1 thisyield 1 write 7 read 0 watdb 0
0:0:7:1:1
stall 2 thisyield 0 write 7 read 7 watdb 0
0:7:7:0:2
stall 1 thisyield 1 write 7 read 1 watdb 0
0:1:7:1:1

Hypothesis: in the presence of parantheses one operation and a control code can be swapped in the human-readable assembler output, in the center column, but continue in the same 1:3 order as normal, in the code output in the right-hand column.

To fix this, I was planning on using the output from nvdisasm -raw and pulling the control codes directly from the binary cubin. But in the meantime I'm just using the nvdisasm from cuda 6.5 (cuobjdump calls nvdisasm internally). You can still use cuda 7.5 you just need to overwrite the new version of that file with the old.

Fixed :-)

ubuntu@peach:~/git/maxas/microbench$ ./run-sass.sh 
Unescaped left brace in regex is deprecated, passed through in regex; marked by <-- HERE in m/ { <-- HERE (?<db5>5)?,?(?<db4>4)?,?(?<db3>3)?,?(?<db2>2)?,?(?<db1>1)?,?(?<db0>0)?}/ at /usr/local/share/perl/5.22.1/MaxAs/MaxAsGrammar.pm line 239.
Unescaped left brace in regex is deprecated, passed through in regex; marked by <-- HERE in m/^(?^:\@(?<predNot>\!)?P(?<predNum>[0-6]) )?DEPBAR(?^: { <-- HERE (?<db5>5)?,?(?<db4>4)?,?(?<db3>3)?,?(?<db2>2)?,?(?<db1>1)?,?(?<db0>0)?});/ at /usr/local/share/perl/5.22.1/MaxAs/MaxAsGrammar.pm line 275.
Kernel: microbench, Instructions: 0, Register Count: 8, Bank Conflicts: 0, Reuse: 0.0% (0/1)
Using: Id:0 GeForce 940M (5.0)

b:00 w:000 t:0000 l:00 clocks:00000000 out:00000007
b:00 w:001 t:0032 l:00 clocks:00000000 out:00000000
b:00 w:002 t:0064 l:00 clocks:00000000 out:00000000
b:00 w:003 t:0096 l:00 clocks:00000000 out:00000000
average: 0.000, min 0, max: 0

(Oh, hmmm, might be a slight buggette in the loop definition )

(should be ok now though perhaps?)