Clarify PMP behaviour for accesses split across pages

Question

Clarify PMP behaviour for accesses split across pages

Timmmm opened this issue 5 months ago · 16 comments

Suppose you have a virtual memory access that spans a page boundary of two pages that are not contiguously mapped to physical memory. This means you'll be doing two accesses. Suppose each of the sub-accesses is fully within two different PMP regions. So the first access is fully within PMP region 0, and the second access is fully within PMP region 1 (and they have the appropriate permission bits set). Like this:

The spec is not entirely clear whether this should succeed or not. It says:

The lowest-numbered PMP entry that matches any byte of an access determines whether that access succeeds or fails. The matching PMP entry must match all bytes of an access, or the access fails

But it's not clear if it is talking about the overall access (the blue box), or each red boxes.

It does say:

Note that a single instruction may generate multiple accesses, which may not be mutually atomic. ... Notably, instructions that reference virtual memory are decomposed into multiple accesses.

But it's not clear if this is talking about multiple accesses due to page table walking, or due to splitting across page boundaries.

Generally it would benefit from defining "access"!

Answer 1 · 2024-04-04T00:14:44.000Z

This means you'll be doing two accesses.

Not necessarily. A memory-access instruction with a misaligned effective address may give rise to multiple accesses, but it also might not. (The page-crossing aspect is a red herring; it's possible and valid to implement page-crossing accesses as a single access.) It's also valid for this situation to give rise to multiple memory accesses, if the implementation so chooses.

And that gets to the heart of your question. If the implementation performs only one access, then the PMP constraint about the entire access fitting within the PMP applies to that access. If the implementation performs multiple accesses, then the PMP constraint applies to each individual access. That is to say, the blue box and the two red boxes are both legal outcomes.

Answer 2 · 2024-04-04T09:45:06.000Z

Ok now I'm even more confused! :-D

So are you saying that in the example the overall physical read/write (both red boxes) could be a "single access" even though they are not contiguous? I always assumed an "access" would always be at least contiguous. What exactly is an "access"?

Answer 3 · 2024-04-04T16:31:48.000Z

No, what Andrew was saying is that an overall access by a load/store instruction may be performed as one memory access or may be broken up (aka decomposed) into pieces and performed as multiple memory accesses. All the bytes of the instruction's access are contiguous, and hence the aggregate of all the decomposed accesses is also contiguous (even though obviously the first byte and last byte of a decomposed series of individual byte accesses, are not contiguous wrt each other). So the point is that a single memory access that straddles two PMP regions will be checked as one access, while individual "piece" accesses may each fall in just one region or the other - and will each be individually checked. Greg

…

On Thu, Apr 4, 2024 at 2:45 AM Tim Hutt ***@***.***> wrote: Ok now I'm even more confused! :-D So are you saying that in the example the overall physical read/write (both red boxes) could be a "single access" even though they are not contiguous? I always assumed an "access" would always be at least contiguous. What exactly is an "access"? — Reply to this email directly, view it on GitHub <#1313 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ALLX6GS7IVMOHHLMXUN2EH3Y3UOMRAVCNFSM6AAAAABFU7MANKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZWG4YDONZXGI> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Answer 4 · 2024-04-04T19:49:43.000Z

The issue here is that the addresses are contiguous in VA space, but not in PA space. So, IFF decomposed, they each individually must pass or fail both PMP checks and MMU checks (regardless if they are contiguous or not). Load/store instruction can have 8 distinct results based on 3 micro-architectural parameters From an arch-test perspective, we have to be able to identify each of those 8 cases, and Sail must be able to be configured for those. I'll post a table for those cases.... shortly. This table isn't complete; there is a base assumption that the configuration variables are static and don't change during execution based on either address or anything other microarchitectural state. If an implementation has dynamic values of those variables, they may fail tests, and it will be the vendor's responsibility to prove that the implementation still meets the architectural spec. To do that, we need to be able to configure

…

On Thu, Apr 4, 2024 at 9:32 AM gfavor ***@***.***> wrote: No, what Andrew was saying is that an overall access by a load/store instruction may be performed as one memory access or may be broken up (aka decomposed) into pieces and performed as multiple memory accesses. All the bytes of the instruction's access are contiguous, and hence the aggregate of all the decomposed accesses is also contiguous (even though obviously the first byte and last byte of a decomposed series of individual byte accesses, are not contiguous wrt each other). So the point is that a single memory access that straddles two PMP regions will be checked as one access, while individual "piece" accesses may each fall in just one region or the other - and will each be individually checked. Greg On Thu, Apr 4, 2024 at 2:45 AM Tim Hutt ***@***.***> wrote: > Ok now I'm even more confused! :-D > > So are you saying that in the example the overall physical read/write > (both red boxes) could be a "single access" even though they are not > contiguous? I always assumed an "access" would always be at least > contiguous. What exactly is an "access"? > > — > Reply to this email directly, view it on GitHub > < #1313 (comment)>, > or unsubscribe > < https://github.com/notifications/unsubscribe-auth/ALLX6GS7IVMOHHLMXUN2EH3Y3UOMRAVCNFSM6AAAAABFU7MANKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZWG4YDONZXGI> > . > You are receiving this because you are subscribed to this thread.Message > ID: ***@***.***> > — Reply to this email directly, view it on GitHub <#1313 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AHPXVJUXO3QZHTTC5K2SSIDY3V6BVAVCNFSM6AAAAABFU7MANKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZXGY3TKMRYG4> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Answer 5 · 2024-04-04T21:50:13.000Z

I miscounted: there are 5 different result This is my little table about what are the legal results of a misaligned access. They depend on 3 DUT parameters - HW misalign: is a misaligned access supported in HW (which might be a region by region as defined by a PMA) - hipri-misalign: does a misaligned access have higher or lower priority than page fault/illegal access - low_first: if misalign is supported, is the lowest address accessed first, or the highest address, or neither (because accesses aren't split)? with results in xCAUSE_CSR (errH, errL, err): cause code for errors detected in the low, and/or high halves (ASSUMPTION: if not decomposed, only low address (base+offset) is checked) xTVAL CSR (low, high): xTVAL value , the address calculated by the instruction (errL, (base+offset)) or the other half (errH, ((base+offset+access_width-1) mod boundary)), where boundary == min(PMP_granularity, access_width) Mem: shows what is stored in memory (if its a store Op and an error is detected) IF any error is detected, the cause will be the error detected in either the low or high half, (as errL and errH indicate) or just a single error if not decomposed (err) (ASSUMPTION: If the access is not decomposed, then *low_first* =1 ) (ASSUMPTION: If HW misaligned is supported, then *hi-pri misaligned* =0, If *HW misaligned* is not supported then the cause = *hi-pri misaligned* ? misaligned access, (*low_first *? errL : errH) Also note that hi/low first might differ if BE is enabled vs disabled in a core that supports both (and might br different between IFetch and DFetch HW misalign hipri-misalign low-first hi/low err? CAUSE TVAL mem 0 x neither OK misalign low unchg 0 1 neither err misalign low unchg 0 0 neither err err low unchg 1 x neither OK -- -- hi,lo 1 x neither err err low unchg 0 0 1 errH,errL errL low unchg 0 0 1 errH, OK errH high unchg 0 0 1 OK, errL errL low unchg 0 0 1 OK, OK -- -- hi,lo 0 0 0 errH,errL errH high unchg 0 0 0 errH, OK errH high unchg 0 0 0 OK, errL errL low unchg 0 0 0 OK, OK -- -- hi,lo 1 x 1 errH,errL errL low unchg 1 x 0 errH,errL errH hi unchg On Wed, Apr 3, 2024 at 5:15 PM Andrew Waterman ***@***.***> wrote:

This means you'll be doing two accesses. Not necessarily. A memory-access instruction with a misaligned effective address *may* give rise to multiple accesses. (The page-crossing aspect is a red herring; it's possible and valid to implement page-crossing accesses as a single access.) It's also valid for this situation to give rise to multiple memory accesses. And that gets to the heart of your question. If the implementation performs only one access, then the PMP constraint about the entire access fitting within the PMP applies to that access. If the implementation performs multiple accesses, then the PMP constraint applies to each individual access. That is to say, the blue box and the two red boxes are both legal outcomes. — Reply to this email directly, view it on GitHub <#1313 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AHPXVJQJGJZGVJNUI4TEBKLY3SLRTAVCNFSM6AAAAABFU7MANKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZVHAZTQNZVGQ> . You are receiving this because you are subscribed to this thread.

On Thu, Apr 4, 2024 at 12:49 PM Allen Baum ***@***.***> wrote:

…

The issue here is that the addresses are contiguous in VA space, but not in PA space. So, IFF decomposed, they each individually must pass or fail both PMP checks and MMU checks (regardless if they are contiguous or not). Load/store instruction can have 8 distinct results based on 3 micro-architectural parameters From an arch-test perspective, we have to be able to identify each of those 8 cases, and Sail must be able to be configured for those. I'll post a table for those cases.... shortly. This table isn't complete; there is a base assumption that the configuration variables are static and don't change during execution based on either address or anything other microarchitectural state. If an implementation has dynamic values of those variables, they may fail tests, and it will be the vendor's responsibility to prove that the implementation still meets the architectural spec. To do that, we need to be able to configure On Thu, Apr 4, 2024 at 9:32 AM gfavor ***@***.***> wrote: > No, what Andrew was saying is that an overall access by a load/store > instruction may be performed as one memory access or may be broken up > (aka > decomposed) into pieces and performed as multiple memory accesses. All > the > bytes of the instruction's access are contiguous, and hence the aggregate > of all the decomposed accesses is also contiguous (even though obviously > the first byte and last byte of a decomposed series of individual byte > accesses, are not contiguous wrt each other). > > So the point is that a single memory access that straddles two PMP > regions > will be checked as one access, while individual "piece" accesses may each > fall in just one region or the other - and will each be individually > checked. > > Greg > > > On Thu, Apr 4, 2024 at 2:45 AM Tim Hutt ***@***.***> wrote: > > > Ok now I'm even more confused! :-D > > > > So are you saying that in the example the overall physical read/write > > (both red boxes) could be a "single access" even though they are not > > contiguous? I always assumed an "access" would always be at least > > contiguous. What exactly is an "access"? > > > > — > > Reply to this email directly, view it on GitHub > > < > #1313 (comment)>, > > > or unsubscribe > > < > https://github.com/notifications/unsubscribe-auth/ALLX6GS7IVMOHHLMXUN2EH3Y3UOMRAVCNFSM6AAAAABFU7MANKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZWG4YDONZXGI> > > > . > > You are receiving this because you are subscribed to this > thread.Message > > ID: ***@***.***> > > > > — > Reply to this email directly, view it on GitHub > <#1313 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AHPXVJUXO3QZHTTC5K2SSIDY3V6BVAVCNFSM6AAAAABFU7MANKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZXGY3TKMRYG4> > . > You are receiving this because you are subscribed to this thread.Message > ID: ***@***.***> >

Answer 6 · 2024-04-04T23:07:13.000Z

Sigh, more cut/paste errors, and that brings the number of possible results to 7: HW misalign hipri-misalign low-first hi/low err? CAUSE TVAL mem

…

0 x neither OK misalign low unchg 0 1 neither err misalign low unchg 0 0 neither err err low unchg 1 x neither OK -- -- hi,lo 1 x neither err err low unchg 0 0 1 errH,errL errL low unchg 0 0 1 errH, OK errH high unchg,lo 0 0 1 OK, errL errL low unchg 0 0 1 OK, OK -- -- hi,lo 0 0 0 errH,errL errH high unchg 0 0 0 errH, OK errH high unchg 0 0 0 OK, errL errL low hi,unchg 0 0 0 OK, OK -- -- hi,lo 1 x 1 errH,errL errL low unchg 1 x 0 errH,errL errH high unchg On Wed, Apr 3, 2024 at 5:15 PM Andrew Waterman ***@***.***> wrote: > This means you'll be doing two accesses. > > Not necessarily. A memory-access instruction with a misaligned effective > address *may* give rise to multiple accesses. (The page-crossing aspect > is a red herring; it's possible and valid to implement page-crossing > accesses as a single access.) It's also valid for this situation to give > rise to multiple memory accesses. > > And that gets to the heart of your question. If the implementation > performs only one access, then the PMP constraint about the entire access > fitting within the PMP applies to that access. If the implementation > performs multiple accesses, then the PMP constraint applies to each > individual access. That is to say, the blue box and the two red boxes are > both legal outcomes. > > — > Reply to this email directly, view it on GitHub > <#1313 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AHPXVJQJGJZGVJNUI4TEBKLY3SLRTAVCNFSM6AAAAABFU7MANKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZVHAZTQNZVGQ> > . > You are receiving this because you are subscribed to this thread. > On Thu, Apr 4, 2024 at 12:49 PM Allen Baum ***@***.***> wrote: > The issue here is that the addresses are contiguous in VA space, but not > in PA space. > So, IFF decomposed, they each individually must pass or fail both PMP > checks and MMU checks (regardless if they are contiguous or not). > > Load/store instruction can have 8 distinct results based on 3 > micro-architectural parameters > From an arch-test perspective, we have to be able to identify each of > those 8 cases, and Sail must be able to be configured for those. > I'll post a table for those cases.... shortly. > > This table isn't complete; there is a base assumption that the > configuration variables are static > and don't change during execution based on either address or > anything other microarchitectural state. > If an implementation has dynamic values of those variables, they may fail > tests, > and it will be the vendor's responsibility to prove that the > implementation still meets the architectural spec. > > To do that, we need to be able to configure > > On Thu, Apr 4, 2024 at 9:32 AM gfavor ***@***.***> wrote: > >> No, what Andrew was saying is that an overall access by a load/store >> instruction may be performed as one memory access or may be broken up >> (aka >> decomposed) into pieces and performed as multiple memory accesses. All >> the >> bytes of the instruction's access are contiguous, and hence the >> aggregate >> of all the decomposed accesses is also contiguous (even though obviously >> the first byte and last byte of a decomposed series of individual byte >> accesses, are not contiguous wrt each other). >> >> So the point is that a single memory access that straddles two PMP >> regions >> will be checked as one access, while individual "piece" accesses may >> each >> fall in just one region or the other - and will each be individually >> checked. >> >> Greg >> >> >> On Thu, Apr 4, 2024 at 2:45 AM Tim Hutt ***@***.***> wrote: >> >> > Ok now I'm even more confused! :-D >> > >> > So are you saying that in the example the overall physical read/write >> > (both red boxes) could be a "single access" even though they are not >> > contiguous? I always assumed an "access" would always be at least >> > contiguous. What exactly is an "access"? >> > >> > — >> > Reply to this email directly, view it on GitHub >> > < >> #1313 (comment)>, >> >> > or unsubscribe >> > < >> https://github.com/notifications/unsubscribe-auth/ALLX6GS7IVMOHHLMXUN2EH3Y3UOMRAVCNFSM6AAAAABFU7MANKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZWG4YDONZXGI> >> >> > . >> > You are receiving this because you are subscribed to this >> thread.Message >> > ID: ***@***.***> >> > >> >> — >> Reply to this email directly, view it on GitHub >> <#1313 (comment)>, >> or unsubscribe >> <https://github.com/notifications/unsubscribe-auth/AHPXVJUXO3QZHTTC5K2SSIDY3V6BVAVCNFSM6AAAAABFU7MANKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZXGY3TKMRYG4> >> . >> You are receiving this because you are subscribed to this thread.Message >> ID: ***@***.***> >> >

Answer 7 · 2024-04-05T11:34:44.000Z

This is still very unclear to me.

No, what Andrew was saying is that an overall access by a load/store instruction may be performed as one memory access or may be broken up (aka decomposed) into pieces and performed as multiple memory accesses.

Yes but the example I gave MUST be decomposed into pieces because it is discontiguous in physical memory.

So the point is that a single memory access that straddles two PMP regions will be checked as one access

How? PMP checks can only be done on physical addresses, and the physical addresses accessed are two discontiguous chunks.

while individual "piece" accesses may each fall in just one region or the other - and will each be individually checked.
..
So, IFF decomposed, they each individually must pass or fail both PMP checks and MMU checks (regardless if they are contiguous or not).

Yes.. they are individually checked, but are they part of the same "access" or not. That's important because of this bit from the spec:

The lowest-numbered PMP entry that matches any byte of an access determines whether that access succeeds or fails. The matching PMP entry must match all bytes of an access, or the access fails

The fundamental issue is that the spec talks about "access" but we have at least two types of "access" and it isn't clear which it is talking about:

The access that the instruction requested (which may be in physical or virtual memory).
The actual accesses of physical memory, or which there can be 1 or more if it needs to be decomposed (or even if it doesn't).

Let me give me two alternative specifications that would specify the desired behaviour and you can tell me which one is the intended one. :-)

Option 1

The lowest-numbered PMP entry that matches any byte of an access determines whether that access succeeds or fails. The matching PMP entry must match all bytes of an access, or the access fails. If an instruction's memory access is decomposed into multiple physical memory accesses (for example because it crosses a virtual page boundary and the pages are not mapped to contiguous physical memory), then all bytes of all of the decomposed accesses must match a single PMP entry or the overall access fails.

Option 2

The lowest-numbered PMP entry that matches any byte of an access determines whether that access succeeds or fails. The matching PMP entry must match all bytes of an access, or the access fails. **If an instruction's memory access is decomposed into multiple physical memory accesses (for example because it crosses a virtual page boundary and the pages are not mapped to contiguous physical memory), then each physical memory access is independent. If any of them fail then the overall access fails, but they do not need to all match the same PMP entry.

@allenjbaum thanks for the table but I couldn't figure out the layout. Any chance you could post it again in markdown format or CSV?

Answer 8 · 2024-04-05T11:41:19.000Z

It’s simply not true that physical discontinuity mandates that an access be broken up. Yes, it’s a practical choice to do so, but nothing in the spec says that it must be so. My earlier post lays out the valid options.

Answer 9 · 2024-04-05T11:55:55.000Z

Ok but either way it could be decomposed into multiple accesses, so the question still stands. If it does decompose it into two physical accesses and they match different PMP regions, is the access required to fail, succeed, or either?

Answer 10 · 2024-04-05T11:57:53.000Z

The second paragraph of my original post directly answers that question.

Answer 11 · 2024-04-05T12:44:18.000Z

Ah ok, I think I didn't follow because you said the blue box is a valid outcome, but that's not a physical memory access. So would you say this is correct?

If an instruction's memory access is decomposed into multiple physical memory accesses then each physical memory access is independent. If any of them fail then the overall access fails, but they do not need to all match the same PMP entry.

A single physical access may read or write more than one discontinuous region or memory. For example if a virtual memory access crosses a page boundary and the pages are not mapped to contiguous physical memory, then two discontiguous regions of physical memory will need to be accessed. This may be done as a single access or decomposed into multiple accesses. If it is performed as a single discontiguous access then all the bytes in both regions (but not the bytes in-between) must be in a single PMP region in order for the access to succeed. From a software perspective the behaviour in this case is implementation defined.

Answer 12 · 2024-04-05T23:15:51.000Z

Yeah, the blue box comment was confusing in retrospect. What I was trying to say is that not breaking up the access is valid.

Indeed I agree with that description, but I'll add two things for clarification:

Although "if any of them fail then the overall access fails" is true in the sense that an exception will be raised, it is also the case that a subset of the original access may be performed--namely, for the subset of accesses that pass the PMP check, side effects may be actioned and store data may be written to memory. (If you consider that misaligned accesses may be trapped and emulated using a sequence of byte accesses, it makes sense why this might happen.)
The "implementation-defined" characterization is true, but the set of valid behaviors is quite heavily restricted (to the set of behaviors we've been discussing in this thread).

Answer 13 · 2024-04-06T07:05:26.000Z

That makes sense, thank you!

a subset of the original access may be performed

Ah is this what defines an "access" - something that will be performed in its entirety or not at all?

If so that makes it much clearer. I'll try to make a PR at some point to add clarifying text.

Answer 14 · 2024-04-06T20:37:36.000Z

I think we should follow the RVWMO spec’s terminology, which differs from what we’ve informally used in this thread. In that spec, we say that a memory-access instruction gives way to potentially multiple memory operations. Yes, those operations are indivisible in the context of precise exceptions, coherent memory regions, etc.

Answer 15 · 2024-04-08T11:18:21.000Z

Ah interesting. That seems slightly inconsistent with the PMP spec wording then. If a memory-access instruction (e.g. lw) leads to a single "access" that is performed using 1 or more atomic "operations", then this:

The lowest-numbered PMP entry that matches any byte of an access determines whether that access succeeds or fails. The matching PMP entry must match all bytes of an access, or the access fails.

suggests that all bytes must be in the same PMP entry, i.e. the example in my first comment MUST fail. But from this discussion that isn't the case so it should be worded something like this:

A memory access can be decomposed into one or more atomic memory operations (see Chapter ...). PMP checks are performed independently on each operation. The lowest-numbered PMP entry that matches any byte of a memory operation determines whether that operation succeeds or fails. The matching PMP entry must match all bytes of an operation, or the operation fails.

If any operation fails then the overall access will fail, however passing operations may cause side effects.

Answer 16 · 2024-04-08T20:43:30.000Z

Yeah, RVWMO invented new terminology after the PMP spec was written. I figured that if we're going to end up tweaking the wording for clarity, we might choose to unify the terminology, too.