riscv-non-isa/tg-nexus-trace

Feedback on RISC-V N-Trace Specification 1.0.0 rc2.0 - chapter 8

Closed this issue · 1 comments

  1. Section 8.1: "following Sv39/Sv48/Sv57 address generation rules"
    needs to be updated. The paged virtual-memory system does not provide rules
    for address generation but for address validity checks. Suggest update as follows:
    "Address transmission in RISC-V N-trace is in compliance with the IEEE-5001
    Nexus Standard. Per the IEEE-5001 Nexus Standard, the most significant
    address bits that are zero are not transmitted. Additionally, for RV64,
    RISC-V N-trace introduces an extension that allows for the omission of
    identical most significant bits during address transmission. This extension
    exploits the rules set forth by the RISC-V paged virtual-memory system for
    valid virtual addresses when SXLEN=64. For further details on this
    optimization, please refer to the 'Virtual Addresses Optimization' section.

  2. section 8.1 :
    "Rules when generating addresses" -> "Rules for generating addresses"
    The bullet 4 is not rules for generating addresses. The bullet 4 can be made
    a sub-bullet of bullet 3.

  3. Section 8.2 - "Trace encoder must..." the rules in this paragraph is not
    complete. Applying this rule at encoder as stated will not lead to the
    desired result or match the examples. For example, applying this rule will
    lead to add address=0xFFFF_FFFE_3FFF_FFFE being encoded as 0x1FFF_FFFF
    which when decoded will lead to a wrong address.

    A more detailed and correct algorithm should be included. For example
    something like follows.

    The process for detecting and skipping most significant identical bits in an
    address A is as follows.

    1. Let A be the XLEN (64 or 32) bit wide address.
    2. Let i be XLEN-1
    3. if bits i:i-5 are not identical to bit i-6 then goto step "vi"
    4. Let i = i - 6
    5. if i > 6 go to step "ii"
    6. Let B[i-1:0] = A[i:1]
    7. Let p = (i-1) % 6
    8. if i < XLEN-1, then set bits B[i+p-1:i] to the value of bit B[i-1]
    9. transmit B
  4. Section 8.4 - the statement "Field I-CNT..." seems to conflate the field in the
    message with the counter in the encoder.
    "The I-CNT field, present in most messages, transmits the value of the
    I-CNT counter, which counts the number of halfwords used to encode
    retired instructions."

    "The I-CNT counter in the trace encoder is reset to 0, in accordance with
    the IEEE-5001 Nexus Standard, under one of the following two conditions:

    • When tracing starts or is restarted for any reason.
    • After the I-CNT counter value has been transmitted in a message."
  5. section 8.4.1 -> remove reference to "prefetch fault" as there is no
    such exception.

  6. Table 26 should indicate which messages are used for each of the cases in
    BTM and in HTM. The examples in section 8.8 do not provide all details.
    Specifically, the details around treatment of HIST/HREPEAT/B-COUNT are not
    provided in this section or in the examples. For BTM, if the choice between
    use of a ProgramTraceSync with code=4 or ResourceFull message is a
    implementation choice then some guidance for that choice should be provided;
    and if it is not then under which circumstances the choice is made should be
    specified.

  7. Section 8.6: Timestamp recording -> Timestamp reporting

Editorial:

  1. Section 8.1 - "Normally (with above bit enabled..." ->
    "Normally (without the above bit enabled or implemented), addresses with many
    most significant bits set to 1 will be sent as long packets (as variable size
    fields skip only the most significant bit set to 0). The following address,
    0xFFFF_FFFF_8000_31F4 (a real address from the Linux kernel), will be encoded
    as F-ADDR=0x7FFF_FFFF_C000_18FA (with the least significant 0-bit skipped).
    Such a 63-bit variable field value will require 11 bytes to be sent (as we
    have 6 MDO bits in each byte)."

  2. Section 8.3 and 8.3.1

    "When operating in HTM mode, the encoder does not generate messages for
    conditional branches. Instead, it maintains a HIST register or accumulator to
    record the outcomes of these branches, whether taken or not. Each conditional
    branch contributes a single bit to the HIST register, as follows:

    • A bit with a value of 1 is appended at the least significant position for a
      taken conditional branch.
    • A bit with a value of 0 is appended at the least significant position for a
      not-taken conditional branch.

    The HIST register may be implemented as a left-shift register. Initially, when
    the HIST register is empty, bit 0 of the register is set to 1, with all other
    bits set to 0. Subsequent conditional branches cause the register to shift left,
    recording each taken or not-taken outcome in bit 0. The transition of the most
    significant bit from 0 to 1 indicates the register is full. At this point, the
    entire register, including the most significant bit — which serves as the
    stop-bit — is transmitted. After transmission, the register is reset to its
    initial, empty state. The HIST register is transmitted using a ResourceFull
    message with the RCODE field set to either 1 or 2.

    Decoders must initiate the interpretation of the HIST field starting from the
    second most significant bit. The most significant bit, designated as the
    stop-bit, is invariably set to 1. This second most significant bit—immediately
    following the stop-bit—encodes the outcome of the first conditional branch
    captured in the HIST register. Conversely, the least significant bit represents
    the outcome of the last conditional branch prior to the transmission of the HIST
    register.

    When a HIST register is full, if its value is the same as that of the HIST
    field transmitted in the last ResourceFull message, then the encoder may
    increment an internal HREPEAT counter (history repeat counter) instead of
    generating a ResourceFull message if the Repeated History Optimization is
    enabled. See Repeated History Optimization section for further details."

  3. Section 8.4 "Every retired instruction..." and its sub bullets->
    "Every retired instruction increments the I-CNT counter by 1 for 16-bit
    instructions and by 2 for 32-bit instructions. This increment applies to
    all forms of branch instructions as well. However, instructions that either
    raise exceptions or are interrupted prior to retirement do not increment
    the I-CNT counter.

  4. Section 8.4 "When I-CNT counter is full..." ->
    "When the I-CNT counter reaches its maximum value, becoming full, it can be
    reported in one of two ways:

    • By using a ResourceFull message with RCODE=0. This method is applicable
      to both BTM and HTM.
    • By using a Synchronizing message with SYNC=4 (Sequential Instruction
      Counter), which is exclusive to BTM. For HTM, since no SYNC code exists
      to report HIST register overflow, employing a ResourceFull message is
      necessary. Thus, using a ResourceFull message in HTM for I-CNT overflow
      reporting ensures consistency across reporting mechanisms."
  5. section 8.4.1 "(... does not matter)" -> "(Specific operations are abstracted
    as "..." as they do not matter for this example)"

  6. Table 26 "but HIST and I-CNT should provide the PC " ->
    "The I-CNT and HIST fields may be used determine the PC of the last
    instruction retired before reset".

  7. "If timestamp is enabled all" -> "If the timestamp is enabled, all"

  8. "The TSTAMP field is a variable-length field and most ..." ->
    "The TSTAMP field is a variable-length field, and most significant bits set
    to 0 will not be transmitted. This approach provides good compression for
    both relative and absolute timestamps."

  9. Section 8.6
    "The following rules must be observed..."->
    "The following rules must be observed:

    • If timestamps are enabled, ALL Synchronizing Messages must include
      an absolute TSTAMP value.
    • It is not required for all non-synchronizing messages to always
      report a timestamp. Doing so may be opted for saving trace bandwidth or
      in the case of sending back-to-back messages.
    • The absolute timestamp cannot exceed 64 bits (even with 1ps resolution,
      64-bit counters will overflow in about 584 years).
      • Implementations may choose a smaller counter. Trace tools may assume
        the timestamp will not overflow in a single session, although adding
        support for overflow is not significantly challenging.
    • It is suggested that in multi-hart systems, all Trace Encoders use a
      shared timestamp (for better trace correlation), but it is not mandatory.
    • In all cases, when an address is provided, the timestamp should reflect
      the time when an event leading to that particular address being sent
      occurred."
  10. "If the above is not possible..."->
    "If the above is not feasible, timestamps should at least be reported
    consistently, ensuring that the time distance between distant events (for
    example, a periodic timer interrupt) can be reliably calculated. It is necessary
    to ensure that the time reported at exceptions/interrupt handlers reflects the
    moment when the exception or interrupt was observed."

  11. Make section 8.8 a sub-section of 8.6. In section 8.8 correct spelling of
    "ProgramTraceSyn". Either explain the red color coding or remove it if it is
    not significant. The examples seem to refer to foot notes 1, 2, and 3 but
    the foot notes are missing. For case 6, what does "After P" mean?
    The case 6 seems to imply that "trTeInstTracing" has pulses periodically
    causing it to transition from 1->0 and 0->1 at P intervals. Is that
    intended? Same question about trTeInstTracing dropping to 0 when SYNC TRIGGER
    occurs.

Addressed in 1.0.0_rc25 (or even earlier ...) version.