including applicable extensions in the opcode syntax

Question

including applicable extensions in the opcode syntax

neelgala opened this issue 3 years ago · 13 comments

I have been using this repo as the official source of encodings for our internal design and verification tools. One issue that I have been facing is the lack of concrete information of "under which extension(s) is an instruction applicable". I am looking at decoding only instructions which are applicable for a user-defined ISA. So if the user specifies RV64IMC then only instructions under those 3 extensions must be decoded. Even though the filenaming convention right now is somewhat useful, it does not fully address all the issues. two of which I have described below:

c.flw should be applicable only when F and C are both implemented. So placing it inside opcodes-rvc confuses the tools and having a separate file opcodes-rv32fc increases maintenance.
instructions like pack which are present under multiple extensions (zbp, zbf and zbe). Placing pack into individual opcodes file for each sub-extension might work but is not scalable. One has to remember to edit all those files for any change in the instruction

so, having a file-naming convention alone might not work. The syntax of opcode entries will need to change slightly. The following is a very quick and dirty proposal (and will need refining) of what I think can work to address the above issues:

Add the list of comma-separated extensions under which the encoding is a legal instruction; wrapped within | | at the end of the line.

Examples

c.flw      1..0=0 15..13=3 12=ignore 11..2=ignore |RV32FC|

pack       rd rs1 rs2 31..25=4  14..12=4 6..2=0x0C 1..0=3 |RV32Zbp, RV32Zbf, RV32Zbe, RV64Zbp, RV64Zbf, RV64Zbe|

Tools can then use substring matching to identify if that instruction is applicable for the user-defined ISA or not.

A better way of doing the above would be to use regex (less readable but extremely powerful) :

c.flw      1..0=0 15..13=3 12=ignore 11..2=ignore |RV(32).*(F).*(C).*|

pack       rd rs1 rs2 31..25=4  14..12=4 6..2=0x0C 1..0=3 |RV(32|64).*(Zbp|Zbf|Zbe).*|

The regex will need to follow a few strict guidelines while writing but that should be manageable.

Pros of the proposal:

the syntax is pretty regex-able and simply adds on to the current syntax. Current tools depending on this repo will simply need to ignore everything between | |.
minimal changes to existing scripts in this repo to generate the current set of artifacts
does not require a strict file naming convention - improves scalability
number of files in the repo will reduce - improves maintenance

Before I go on to work on a PR for the above, I wanted to get a sense if such a change is welcomed/acceptable?

Answer 1 · 2022-01-29T08:38:14.000Z

My immediate reaction is that I prefer a different approach that’s more similar to what we’re currently doing: use the file names to make this distinction, rather than adding metadata to the individual instructions.

For instructions that belong to multiple extensions, we could use the existing @ aliasing scheme when they appear in multiple files, or invent some new prefix that means “I know this is defined elsewhere, but I’m including it here anyway, without explicitly rewriting its operands”.

Regardless, I agree we should solve the problem you’re trying to solve, and your solution is a reasonable approach. I’d like others to weigh in.

Answer 2 · 2022-01-31T14:03:36.000Z

I spent a little more time on the file based distinction scheme and could come up with the following. Let me know your thoughts. X and Y below represent extension characters/strings.

rv_x_y - contains instructions common within the 32-bit and 64-bit modes when both x and y extensions are enabled.
rv32_x_y - contains instructions present in rv32xy only (absent in rv64X_Y eg. ???)
rv64_x_y - contains instructions present in rv64xy only (absent in rv32X_Y, eg. addw)
_y in the above is optional and can be null
for instructions present in multiple extensions, the instruction encoding must be present in the first extension when ~~alphabetically~~ canonically ordered. All other extensions can simply include a $import prefix followed by <filename> and <instruction_name> separate by :: . For e.g pack would be present in the rv32_zbe file as
pack rd rs1 rs2 31..25=4 14..12=4 6..2=0x0C 1..0=3 and rv32_zbf and rv32_zbp files would have the following entries : $import rv32_zbe::pack
For pseudo ops we use $pseudo_op <filename>::<instruction> <overloaded fields/patterns> to indicate the original instruction that this pseudo op depends on and the fields that need change. For e.g. when shfli gets ratified zip can be represented in rv32_zbkb as : $pseudo_op rv32_zbp::shfli shamtw=15

In the above scheme I am basically reserving $ to indicate that a kyeword follows.

The above scheme will still require siginificant re-arrangement of the current repo files For e.g. rv32i will move to rv_i and rv64_i will contain the additional 64-bit mode base instructions and so on.

Answer 3 · 2022-01-31T14:07:22.000Z

maybe "canonically" ordered instead of "alphabetically" ordered makes more sense ?

Answer 4 · 2022-02-09T17:39:20.000Z

On Tue, Feb 8, 2022 at 11:28 PM Allen Baum ***@***.***> wrote: That works for me, and it also matches the name; all Zbkb, Zbkc, and Zbkx tests would br in B. This is a bit awkward since they're already in K_unratified, so they would need to be removed from there, and some of those ops aren't actually part of the ratified bitmanip spec. So what happens if they decide that those ops will never be ratified by the bitmanip spec? Nobody is working on it now.

Shouldn't all tests and related support for instructions that are not ratified nor have any official effort in flight nor even have an associated TG with a charter approved by the TSC, be removed (or moved out into some other "archive in case they become relevant in some future year)? Ditto for tests/etc. related with stuff that was work in progress by a TG but was dropped from what the TG put forward for ratification? Greg Message ID: ***@***.***>

…

Answer 5 · 2022-02-09T19:00:58.000Z

sorry, this was added to this thread accidentally. The issue is that ops in the ratified Zbkx extension are defined only in the *ratified* crypto scalar spec, but not defined in the bitmanip spec. Likewise, half of ops in the ratified Zbkb extension are defined only in the *ratified* crypto scalar spec, but not defined in the bitmanip spec. And, there is no bitmanip TG anymore that is tasked with adding them. (which leads to scratching our heads wondering where we should put the tests, which have been written....)

…

On Wed, Feb 9, 2022 at 9:39 AM gfavor ***@***.***> wrote: On Tue, Feb 8, 2022 at 11:28 PM Allen Baum ***@***.***> wrote: > That works for me, and it also matches the name; all Zbkb, Zbkc, and Zbkx > tests would br in B. > This is a bit awkward since they're already in K_unratified, so they would > need to be removed from there, > and some of those ops aren't actually part of the ratified bitmanip spec. > So what happens if they decide that those ops will never be ratified by > the bitmanip spec? > Nobody is working on it now. > Shouldn't all tests and related support for instructions that are not ratified nor have any official effort in flight nor even have an associated TG with a charter approved by the TSC, be removed (or moved out into some other "archive in case they become relevant in some future year)? Ditto for tests/etc. related with stuff that was work in progress by a TG but was dropped from what the TG put forward for ratification? Greg Message ID: ***@***.***> > — Reply to this email directly, view it on GitHub <#100 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AHPXVJTWPHALFLSPAZBCY2DU2KRFJANCNFSM5NCSBLAQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you commented.Message ID: ***@***.***>

Answer 6 · 2022-02-09T19:26:40.000Z

On Wed, Feb 9, 2022 at 11:01 AM Allen Baum ***@***.***> wrote: sorry, this was added to this thread accidentally. The issue is that ops in the ratified Zbkx extension are defined only in the *ratified* crypto scalar spec, but not defined in the bitmanip spec. Likewise, half of ops in the ratified Zbkb extension are defined only in the *ratified* crypto scalar spec, but not defined in the bitmanip spec. And, there is no bitmanip TG anymore that is tasked with adding them. (which leads to scratching our heads wondering where we should put the tests, which have been written....)

One can view all this, as of the end of last year, that a number of bitmanip extensions were ratified, some by the BitManip group and some by the Crypto group. The details of which TG ratified which Zb* extensions, and when, is ultimately just a historical matter and beside the point. And the fact that, for now, these are documented in two separate documents and separate from the Unpriv spec document, also seems beside the point. (Ultimately these will all be combined into an updated Unpriv document.) So it would seem that any "old" unratified bitmanip stuff should be removed. And both sets of ratified extensions can all be grouped together, if you like, in "B". If and when another new TG comes along as ratifies another group of "bitmanip" instructions, then that would be added to the "B" group of tests. Greg Message ID: ***@***.***>

…

Answer 7 · 2022-02-09T20:24:32.000Z

Yes, that is one approach, but it does mean ripping up moving a whole bunch of files from K repo directories to B repo directories in arch test, and probably elsewhere as well. Not rocket science, but. a pain. The other is just to leave them where they are, and count on riscof being able to find them.

…

On Wed, Feb 9, 2022 at 11:26 AM gfavor ***@***.***> wrote: On Wed, Feb 9, 2022 at 11:01 AM Allen Baum ***@***.***> wrote: > sorry, this was added to this thread accidentally. > The issue is that ops in the ratified Zbkx extension are defined only in > the *ratified* crypto scalar spec, but not defined in the bitmanip spec. > Likewise, half of ops in the ratified Zbkb extension are defined only in > the *ratified* crypto scalar spec, but not defined in the bitmanip spec. > And, there is no bitmanip TG anymore that is tasked with adding them. > (which leads to scratching our heads wondering where we should put the > tests, which have been written....) > One can view all this, as of the end of last year, that a number of bitmanip extensions were ratified, some by the BitManip group and some by the Crypto group. The details of which TG ratified which Zb* extensions, and when, is ultimately just a historical matter and beside the point. And the fact that, for now, these are documented in two separate documents and separate from the Unpriv spec document, also seems beside the point. (Ultimately these will all be combined into an updated Unpriv document.) So it would seem that any "old" unratified bitmanip stuff should be removed. And both sets of ratified extensions can all be grouped together, if you like, in "B". If and when another new TG comes along as ratifies another group of "bitmanip" instructions, then that would be added to the "B" group of tests. Greg Message ID: ***@***.***> > — Reply to this email directly, view it on GitHub <#100 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AHPXVJRDBTUUKHUCF25GJX3U2K5XXANCNFSM5NCSBLAQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you commented.Message ID: ***@***.***>

Answer 8 · 2022-03-03T10:13:31.000Z

@aswaterman I have gone ahead with implementation of my proposal and have an initial draft of what the revised repo will look like : https://github.com/incoresemi/riscv-opcodes/tree/restructuring-opcodes. I am yet to fix the parse_opcodes.py file, but before I do that I wanted to get a feedback if the revised structure is acceptable.

Important points to note:

I use $import to indicate that an extension is borrowing an instruction from another extension (look at zkn for example)
I use $pseudo_op to indicate instructions which are defined by spec as a pseudo ops for standard instructions (again look at rv_pseudo)
I have also cleaned up compressed instruction support significantly.
the concept of aliases or usage of '@' is no longer supported
There are few places like in zbkb where we have instructions like pack/packh which are pseudo ops fo unratified instructions under zbp/e. I have kept those as pseudo ops. Let me know if they shuold be treated as standard ops for now ?

Feedback is highly appreciated - post which I will start working on the python code.

Answer 9 · 2022-03-03T10:23:17.000Z

And to adress greg's points we can have rv_*_unratified as a file naming convention which when it gets ratified simply drops the postfix _unratified - so everyone knows whats ratified and whats not.

Also for my current draft for the bitmanip I have gone ahead with extensions mentioned in 0.94 draft for the unratified instructions.

Answer 10 · 2022-03-03T22:20:23.000Z

Yeah, I think this is going in the right direction. And I appreciate that you sought feedback on the design before doing all of the software hacking.

I'd like others who have skin in the riscv-opcodes game to chime in before @neelgala goes off and does a bunch more work.

Answer 11 · 2022-03-07T11:22:29.000Z

@aswaterman so I have got the scripting work done for most of it but I am having a hard-time with pseudo ops.

Let's take the example of slli of the base ISA.
In the current framework for rv32 slli is defined as a pseudo op in opcodes-pseudo. For rv64 slli is defined as a standard op in opcodes-rv64i.

However, in my approach I have slli in files rv32_i and in rv64_i both have their respective valid encodings (bit 25 being zero for rv32_i version). So if someone was looking for instruction encodings for ISA=RV32I they would look at rv_i + rv32_i to find the right set of encodings.

Is my approach okay or would you prefer treating slli in rv32_i as a pseudo op of slli in rv64_i ?

Answer 12 · 2022-03-07T14:39:17.000Z

Going over it again I think you can discard my previous comment - treating slli in rv32_i as a pseudo op of the rv64_i version makes more sense and keeps the scripting work simple. I no longer need to parse pseudo opcodes as long as the corresponding standard op has been parsed. I checked in spike also - the encodings for the pseudo ops (like slli_rv32) are never used. So I guess this approach is better.

On the latex front, I see riscv-isa-manual uses the output from this repo for the instruction encoding tables. I wanted to know if the following was doable:

can we rearrange tables to be alphabetically organized ? start with ADD end with XORI ?
For instructions like ECALL, EBREAK, where the entire encoding is static, can avoid the verifical bars for them ? Basically this:
```
|00000000000000000000000001110011 | ECALL
```
instead of
```
|000000000000 | 00000 | 000 | 00000 | 1110011 | ECALL
```
This would make the whole latex-generation code very simple and contributors will have one less issue to worry about when
adding new instructions.

Answer 13 · 2022-05-03T05:21:40.000Z

closed in #106