riscv-non-isa/riscv-iommu

Incompatible "FAULT_TYPE = UR" reasons in ATS is observed

Closed this issue · 2 comments

Another ATS compatibility issue is observed for the DTI-ATS "FAULT_TYPE" field:

DTI-ATS fault type

FAULT_TYPE:
00 InvalidTranslation
01 CompleterAbort
10 UnsupportedRequest
11 Reserved
...
When the value of this field is CompleterAbort, this field indicates that there was an error during the translation process. The DTI master returns a Translation Completion message with the status value as CompleterAbort (CA).
When the value of this field is UnsupportedRequest, this field indicates that ATS is disabled for this or all StreamIDs. The DTI master returns a Translation Completion message with a status value as UnsupportedRequest (UR).

SMMUv3 F_BAD_ATS_TREQ

And DTI-ATS follows ARM SMMUv3 specification, for UR FAULT_TYPE, the specification mentions it is related to the F_BAD_ATS_TREQ, which denotes:

Reported in response to an ATS Translation Request in any of the following conditions:

  • SMMU_CR0.SMMUEN==0
  • STE.V==1 and effective STE.EATS==0b00
    -- Note: See STE.EATS for details. The effective value of EATS is treated as 0b00 in some situations including if STE.Config==0b100, or STE is Secure.
  • If it is possible for an implementation to observe a Secure ATS Translation Request, this event is recorded.
    Note: This event is intended to provide visibility of situations where an ATS Translation Request is prohibited, but an ordinary transaction to the same address from the same StreamID or SubstreamID might complete successfully (where a failure of a TR might otherwise be difficult to debug by issuing an ordinary transaction). Translation Requests do not cause other events (such as C_BAD_STE) to be recorded.

RISCV Unsupported Request (UR) causes

The incompatibility is then observed for the DTI-ATS checker developed based on the ARM DTI-ATS behavior for the FAULT_TYPE field as RISCV IOMMU specification mentions following UR result:

If there is a permanent error or if ATS transactions are disabled then an Unsupported Request (UR) response is generated. The following cause codes belong to this category:
• All inbound transactions disallowed (cause = 256)
DDT entry load access fault (cause = 257)
DDT entry not valid (cause = 258)
DDT entry misconfigured (cause = 259)
• Transaction type disallowed (cause = 260)

Incompatibility observed

In DTI-ATS checker, 257/258/259 fault reasons will lead to CA rather than UR. The rule is:

  1. IOMMU.mode=off, ddt.v=1 and EN_ATS=0 result in FAULT_TYPE=2;
  2. Implicit S2 fault results in FAULT_TYPE=0;
  3. all others result in FAULT_TYPE=1.

RISCV IOMMU specification 2.1.3. also mentions UR is related to EN_ATS=0 which is DTI-ATS compliant. This actually can be interpreted as a specification ambiguity. This ambiguity leads to the following confusing hardware behaviors, affecting the software programming models:

  1. ddt access, ddt misconfigured result in FAULT_TYPE=2 when protocol is ATS but FAULT_TYPE=1 when protocol is non-ATS;
  2. ddt access, ddt misconfigured may result in FAULT_TYPE=2 while pdt access, pdt misconfigured always result in FAULT_TYPE=1;
  3. ddt.V=0 may result in FAULT_TYPE=2 while pdt.V=0 always result in FAULT_TYPE=0.

As such, IMO, we may solve this specification ambiguity due to the following reasons:

  1. to be compatible with the de-facto standard behaviors which has been adopted by the eco-system PCIe IPs;
  2. to have unified programming model with ddt.V=0, ddt access, ddt misconfigured fault handling.
  3. A more interesting programming model can be used when we have this changed: software can switch V=1/0 temporarily to lock a device configuration to flush the transactions related to the device configurations while still be able to have those transactions retried by the PCIe master side.

In DTI-ATS checker, 257/258/259 fault reasons will lead to CA rather than UR. The rule is:

A checker written to this IOMMU specification should look for a UR response. Using a checker for a non RISC-V IOMMU - ARM/IBM/Intel/AMD/etc. IOMMU - with a RISC-V IOMMU will likely lead to unexpected results.

OK, so if you are sure the IOMMU fault type definitions are stable and won't affect the applicable ecosystem, we'll follow.
Thanks for the response.