This package implements a component which drives a JTAG port from data send over an AXI Stream. This is very useful if you want remote access to Xilinx ILAs without using the FPGA's hardware JTAG pins. Instead, the ILA is controlled from a JTAG controller in the fabric which can be remote-controlled in many different ways.
Xilinx Vivado ILAs are accessed over JTAG but in many cases using a physical JTAG connection is impractical. Recent versions of Vivado support the Xilinx "virtual cable" protocol XVC which is very simple and permits remote access to embedded ILA cores.
It is now possible to use this package with ISE/ChipScope as well if you are working with older devices.
However, the protocol and its current support in Vivado has several drawbacks
- Fully synchronous, i.e., the reply to one request must be received before the next one is sent.
- TCP based. This makes an in-firmware implementation cumbersome and more resource-intensive.
- The AXI <-> JTAG bridge which is available as an IP in Vivado is inefficient and requires AXI. In some use cases the AXI interface is already "owned" by an application and a second master to be used by XVC is not available.
If you need to use ISE on older devices then please skip this section.
-
Add the AxisJtagDebugBridge module to your design; hook up to an AXI Stream. This should probably be a UDP server. Note that there are two VHDL 'architectures' of this module ('entity') - a stub (
AxisJtagDebugBridgeStub
) and the true implementation:AxisJtagDebugBridgeImpl
. The stub will be used by default by Vivado so you have to explicitly specify theAxisJtagDebugBridgeImpl
architecture for instantiation. -
Unless you are using the full SURF library from SLAC you have to instantiate a vivado "debug bride" IP in "From JTAG to BSCAN" mode. This should be named
DebugBridgeJtag
to match the instantiation in the AxisJtagDebugBridge wrapper. -
Add ILA cores; under Vivado-2016.04 these must be added to the hdl and cannot be added to an already synthesized design!
-
Compile and install the
xvcSrv
program on a machine that is directly connected to FW. -
Start
xvcSrv
(udp_server_port
is where the AXI Stream is connected)xvcSrv -t <fw_ip_address>:<udp_server_port>
If you are not using the SLAC SURF infrastructure then you might need a different driver (-D) and target (-t) spec.
-
In Vivado connect to the target:
open_hw connect_hw_server open_hw_target -xvc_url <xvc_server_ip>:2542
A whole stack of soft- and firmware components are involved:
-
vivado hardware manager GUI interacts with the xilinx
hw_server
daemon. -
xilinx
hw_server
daemon talks to different 'cable drivers'. In case of XVC this is the 'virtual cable' driver which uses a TCP connection to communicate with a peer. -
xvcSrv
software which is provided by this package. This server handles the XVC protocol on a TCP socket and communicates with firmware entities which drive the JTAG lines.xvcSrv
usually runs on the target system, i.e., hardware which is closely coupled with the target FPGA. -
xvcSrv driver
modules handle the specific firmware interface. There are e.g., modules for- UDP communication (SLAC proprietary protocol stack)
- FIFO with AXILite interface for ZYNQ
- FIFO with TMEM interface for PSI's IFC board
- Xilinx Vivado AXI Debug Bridge IP (you can create one of these in vivado).
It is fairly straight-forward to implement your own driver module for your specific firmware-interface if necessary. Example/skeleton code is provided (
xvcDrvFoo.cc
).
The firmware stack mirrors the software stack:
-
A communication module interacts with the software upstream and features a downstream AXI-Stream interface. For architectures with a bus interface this module typically implements a FIFO which can be driven from the bus on one side and which has an AXI-Stream interface on the other side.
Bus-interface modules are available for AXILite/Zynq and TMEM.
In case of the SLAC UDP protocol stack the protocol routes a particular UDP port to an AXI-Stream interface.
-
The
AxisToJtag
entity converts messages from the AXI-Stream interface into JTAG transactions. -
A vivado
debug_bridge
IP. You must instantiate one of these in "From JTAG to BSCAN" mode and connect the JTAG pins to theAxisToJtag
entity. If you name this bridgeDebugBridgeJtag
then you can directly use theAxisJtagDebugBridge
wrapper which wires aAxisToJtag
entity to the debug bridge. When instantiating this wrapper then you must specify the desired architecture (stub vs. true debug bridge, see below). -
Vivado ILA IPs. Vivado automatically connects those to the debug bridge.
You probably should make sure that you have no "hard" debug-bridge (which is driven by the hardware JTAG pins) in your design.
The package addresses the aforementioned issues:
-
Use a simple AXI Stream for communication. Unlike a register-based approach this permits processing of larger messages which improves throughput.
In our enviroment where registers are accessed via ethernet the Vivado bundled IP is particularly inefficient as for every 32 bits of JTAG data there must be multiple register writes and reads, each of which requires a network transaction.
-
The firmware can be configured to enable a memory buffer. This buffer is used to support unreliable transport media (such as UDP). Each message is sent to the firmware tagged with a "transaction ID". If no reply arrives the same message (with the same ID) is re-sent. The firmware, upon receiving a new ID drives JTAG, remembers the transaction ID and stores the TDO vector in the buffer while sending it back. If firmware receives the same ID again (supposedly because the reply was lost) it does not execute the JTAG vectors again but "plays back" the stored TDO data from memory (JTAG transactions are not idempotent).
-
The firmware core can easily be hooked to a separate UDP server port which allows orthogonal access to JTAG with minimal FPGA resource overhead.
Note that the firmware does not directly implement the XVC protocol (since that would require in-firmware TCP support) but a slightly modified variant which is documented below. Instead, there is a software program which operates as an XVC server and acts as a bridge to the firmware. Different transport methods can be employed for communication between firm-and software:
| Xilinx hw_server | <--- XVC / TCP --> | xvcSrv SW | <- transport -> | AxisToJtag FW |
Examples for transport methods are UDP or an AXI/AXIS Fifo on a Zynq platform.
It is imperative that the xvcSrv software be as tightly coupled to the AXI Stream as possible for sake of optimal throughput.
While the most common use case may be establishing connectivity to a Xilinx debug hub the package is not restricted to this use case -- it is suitable as a general-purpose JTAG controller.
There are two top-level wrappers AxisToJtag
and AxisJtagDebugBridge
which support
the same generics and the same streaming interface. For convenience the latter
variant already instantiates a JTAG to BSCAN IP and connects to its JTAG port.
There are two VHDL architectures of the AxisJtagDebugBridge
entity:
AxisJtagDebugBridgeStub
and AxisJtagDebugBridgeImpl
. While the latter provides the
real implementation described in this document the stub only implements
the QUERY command and replies with a ERR_NOT_PRESENT_C
error, thus informing
a software client that firmware support is not implemented.
The stub appears after the full implementation in the source code so that it
is picked by default by the synthesis tool if AxisJtagDebugBridge
is
instanitated without specifying an architecture.
Therefore, the user has to explicitly request the AxisJtagDebugBridgeImpl
architecture.
The purpose of the stub is allowing a design to unconditionally provide the upstream components but optionally use either the stub or the real implementation for sake of saving resources or avoiding the limitations (see below) associated with the JTAG to BSCAN IP.
AXIS_FREQ_G : real
AXIS_WIDTH_G : positive range 4 to 16
CLK_DIV2_G : positive
MEM_DEPTH_G : natural range 0 to 65535
MEM_STYLE_G : string
-
AXIS_FREQ_G
: The value of the clock frequency (in Hz). This information is used to communicate the JTAG TCK period back to the XVC protocol. -
AXIS_WIDTH_G
: The width (in bytes) of the AXI Stream. Note that this affects the format of the data that the xvcSrv software must supply (see section about stream format). -
CLK_DIV2_G
: Clock divider for generating TCK. The value defines the length of a TCK half-period. I.e., the frequency is the AXI Stream clock divided by twiceCLK_DIV2_G
.Note: TCK is "bit-banged", i.e., not generated by a true clock source -- however, inspection of the circuit the vivado AXI<->BSCAN IP implements reveals that they use the same approach -- even though the BSCAN module does not sample TCK but drive register clocks directly from it. A question has been filed with Xilinx but no answer has been received to date.
-
MEM_DEPTH_G
: how many words (of sizeAXIS_WIDTH_G
) of TDO data to store (in order to support unreliable transport). This limits the maximum message size that can be processed at once. If the transport between the XVC server and FW is reliable thenMEM_DEPTH_G
may be set to zero (no memory). -
MEM_STYLE_G
: memory type (block- vs. distributed RAM) to use for the buffer memory. Valid choices are: "auto", "block" or "distributed" (defaults to "auto").
There are standard clock/reset as well as AXI Stream signals.
The xvcSrv
software must be installed and executed on a machine with
connectivity to the firmware. The basic options for execution are:
xvcSrv [-D transport_driver] -t <target> [ -- <driver_options> ]
where <target>
is a string that passes information to the transport driver
how to reach the firmware target. In case of the "udp" transport (which is
the default) target
is a <ip_addr>:<udp_port>
string, thus, e.g.,
xvcSrv -t 192.168.2.10:8196
The local TCP port is 2542 by default but can be changed with the -p
option.
Other Options:
-h : program prints basic usage information to the console.
-v : print protocol parameter info (retrieved from target).
Multiple 'v' can be given to increase debugging verbosity.
-D <driver> : use/load transport driver <driver>. E.g., `/path/myDriver.so`.
-p <port> : TCP port where to listen for XVC connections.
-M : Max XVC vectors size. This defines the max. block size
to be used on the TCP side (it is beneficial to let this
be big!).
The block size between the driver and the target may well
be smaller (and xvcSrv will break transactions accordingly)
due to limits in the transport (e.g., UDP/ethernet MTU) or
firmware memory. However, if xvcSrv is tightly coupled
to the target then using large blocks on TCP is desirable
in order to mitigate TCP round-trip times.
-- : Delimiter; any options (and args) after '--' are passed to
and interpreted by the transport driver.
Other transport drivers can be easily implemented and compiled into shared
objects which can then be loaded. As an example there is the xvcDrvAxisFifo.cc
file which supports a custom AXI Stream FIFO but illustrates what needs to be
done. The xvcDriver.h
header gives more information about implementing a
driver.
If you have a driver, e.g., myDriver.so
then you can start the server
xvcSrv -D ./myDriver.so -t my_driver_info
The UDP Transport driver is built-in and used by default. The target string must be of the form (udp port 2542 is used by default):
<target_ip>[:<udp_port]
The driver recognizes the following options (given to xvcSrv
after --
)
-m <mtu> : Limit UDP datagrams to less than <mtu> octets. By default
the program tries to guess the MTU size but in some cases
it might be necessary to override this default.
Note that xvcSrv sets the DF (dont-fragment) bit on the
UDP connection, so that UDP datagrams are never broken up.
(firmware does not support IP defragmentation AFAIK.)
-f : Disable DF; i.e., allow IP fragmentation.
This driver supports a Tmem2BscanWrapper
somewhere in the TOSCA2 memory map.
The target string must identify the TOSCA2 address-space and offset where
the firmware registers can be found:
<address_space>:<offset>
e.g.,
-t USER2:0x200000
The driver recognizes the following options (given to xvcSrv
after --
)
-i <irq_file> : name of a device-file that interfaces to the interrupt
generated by the firmware. If this option is not used
then the driver operates in polled mode.
e.g.,
-i /dev/toscauserevents1.13
From vivado (e.g., tcl console in the hardware manager) a connection to the
xvcSrv
can be established with
open_hw_target -xvc_url <xvcSrvHost>:2542
If the port xvcSrv binds to was changed (with -p) then the port passed to
open_hw_target
must be changed accordingly.
Note: When using an ssh tunnel (or another kind of WAN connection) then it
seems to be better to run a hw_server
close to the target, i.e., the
machine where xvcSrv
runs. This gives better response than connecting
from far away to the XVC server directly. E.g.,:
bash$ ssh -tt -L3121:localhost:3121 gateway_host hw_server
and in Vivado you proceed as usual, i.e., connect to the default server (which is now visible at tcp port 3121) and open a xvc target:
%open_hw_server -url localhost:3121
%open_hw_target -xvc_url <xvcSrvHost>:2542
When trying to add ILAs to a synthesized design (e.g., via tcl create_debug_core
)
Vivado% create_debug_core ila_test ila ERROR: [Common 17-69] Command failed: This design contains a debug_bridge IP configured in either 'From_AXI_to_BSCAN' or 'From_JTAG_to_BSCAN' mode. Debug insertion is not currently supported for such designs. Please use the debug instantiation flow.
Therefore, under 2016.04 you can only use XVC when instantiating ILAs into your hdl code.
A basic test under 2017.03 indicated that this limitation has been removed.
Unfortunately, however, even under 2017.03 and 2017.04 it is not possible
to add debug ports to ILAs added by create_debug_core
-- the old error
message appears -- and you are limited to a single port (the width of which
you can change just fine).
Since the ILA with a single port works just fine we believe the restriction to a single port to be a simple bug (failure to remove an obsolete check). Contact the authors for more information.
ISE lacks the debug bridge IP and automatic ILA connection feature of Vivado. In ISE an ILA IP has a control port which must be driven from a JTAG interface.
To that end you have to instantiate a xilinx ICON
IP with a BSCAN
interface.
When creating the IP core from the GUI select Disable Boundary Scan Component Instance
.
This will result in an ICON
with a BSCAN
as well as a control port. The
BSCAN
port is what we want to drive with our "soft JTAG" stack. If the checkbox
is un-checked then the ICON
is wired to the hardware JTAG controller and not
accessible from software (unless we'd connect a hardware cable but then you would
not be reading this).
The last missing piece is the Jtag2Bscan
entity which drives the BSCAN port
from JTAG. Thus, under ISE you have a firmware stack:
- bus/Axi-Stream interface (e.g.,
Axis2TmemFifo
) - Axi-Stream to JTAG (
AxisToJtag
) - JTAG to BSCAN (
Jtag2Bscan
) - BSCAN to ILA control port (Xilinx
ICON
core with externalBSCAN
interface) - ILA with control port
The Jtag2Bscan
module emulates a xilinx BSCANE2
component and TAP controller
(I got valuable inspiration from
bscan_equiv.v).
While the author seems to get away with emulating a Spartan device (which
boils down to supplying the correct IDCODE and using the correct IR length)
I found that ChipScope would not find the ILAs on our Virtex6 devices unless
I let the TAP controller emulate a Virtex6. I mention this here in case
anyone ever uses the code on a different device: you may need to change
the relevant generics...
Note that you have to use ChipScope -- the Vivado hardware-manager does not recognize the ICON/ILA when driven with this package (but Vivado works fine when using a vivado design that uses the vivado debug bridge as described above).
-
Start
xvcSrv
on the target system. -
In ChipScope's
JTAG Chain
menu (on the host system), selectOpen Plugin
-
Enter Plug-in Parameters:
xilinx_xvc HOST=<host_ip>:2542 disableversioncheck=true
substitute your target system's
<host_ip>
, of course. BTW: Since XVC uses TCP you can very conveniently use ssh tunnels and forwarded ports between the ChipScope host and the target system. -
Another pop-up appears and after you click 'Ok' you are in business.
I had problems generating the ICON (v1.06a) and ILA (v1.05a) cores which consistently would produce an error. Googling around reveals that this is not uncommon. The work-around is to enable the "All IP Versions" checkbox and use the previous version (ICON v1.05a and ILA v1.04a, respectively, worked for me). It seems this problem is due to java installation issues - YMMV.
For the IFC board there is a Tmem2BscanWrapper
wrapper which glues all the pieces together and which
can be hooked to any place in TUSER
memory.
E.g., if you connect Tmem2BscanWrapper
to the CS2
window of TUSER2
at offset 0x00000000
then
you can start xvcSrv
on the IFC board:
xvcSrv -D tmem -t USER2:0x200000
If the interrupt line is hooked up, e.g., to USER1
interrupt 13 then you can use irq-driven mode:
xvcSrv -D tmem -t USER2:0x200000 -- -i /dev/toscauserevents1.13
The format of the JTAG vectors in the AXI stream deviates from the XVC layout because the original layout is not well-suited for streaming as XVC transmits the TMS and TDI vectors one after the other whereas they have to be driven onto the JTAG lines in parallel. For streaming we use the following format:
header_word [ , payload ]
where the payload depends on the type of transaction that is executed. All words are in little-endian, i.e., LSbits are transmitted/received first (on JTAG).
The 32 least-significant bits of the header are defined; it is left-padded
so that the payload is always word aligned. The word size depends on the
configuration of the core (AXIS_WIDTH_G
).
The incoming Stream consists of consecutive words of AXIS_WIDTH_G
bytes,
must be framed with TLAST
and is expected to have the aforementioned format :
Header_Word [, Payload ]
The header word is defined as
[31:30] Protocol Version -- currently "00"
[29:28] Command
[27:00] Command-specific parameter(s)
Note that if the core is configured for a stream width (AXIS_WIDTH_G
) > 4
then the header is padded up to the desired width, i.e., the paylod must
be word-aligned.
Each command word is answered with a reply word on the outgoing stream (see below).
The following commands are currently defined:
"00" QUERY: request basic features such as word length, memory depth.
Payload: NONE, i.e., TLAST should be asserted with this command.
"01" JTAG: shift jtag vectors. The vectors are shipped in the payload.
The parameter bits for this command are defined as follows:
[27:20] Transaction ID; this is used when the core is configured
with MEM_DEPTH_G > 0 in order to support a non-reliable
transport.
[19:00] JTAG vector length (in bits). The payload must provide
2*ceil( length / AXIS_WIDTH_G ) words of TMS/TDI vector
data. I.e., the length refers to the length of a single
TMS or TDI vector.
!!!!!!!
NOTE: the number in [19:00] encodes the actual number
minus 1. E.g., a value of 0 transmits one TMS
and one TDI bit. Two payload words are expected
in this example.
!!!!!!!
Payload: sequence of words from the TMS and TDI bit-vectors:
TMS_WORD, TDI_WORD, TMS_WORD, TDI_WORD, ...
Note that the user must format the stream accordingly
and therefore must be aware of the stream width. This
parameter is returned by the QUERY command.
If the number of bits supplied does not fill the last
word then the relevant bits must be lsb/right-aligned
in the last word.
TLAST must be asserted during the transmission of the
last TDI/payload word.
The outgoing stream consists of consecutive words of AXIS_WIDTH_G
bytes
and is framed with TLAST
. Each reply has the following format:
Header Word [, Payload ]
The header word is defined as
[31:30] Protocol Version; if the user supplies an unsupported protocol
version in the request header then the reply contains an error
code (see below) and the protocol version in the reply is set
to the supported version.
[29:28] Command -- the request command is returned unless an error occurred;
in case of an error the command bits in the reply are:
"10" ERROR: An error was detected. The 8 least-significant bits
[7:0] contain an error code:
1: bad protocol version; the protocol version in the reply
is set to the supported version.
2: bad/unsupported command code
3: truncated input stream (TLAST detected before the
first TDI word was received). Note that a premature
TLAST which is detected after the first TDI word
does NOT flag an error but yields a truncated reply
(less TDO words than requested by the number of bits).
4: 'debug bridge not present' error. I.e., the FW only
implements a stub and no true debug bridge.
"00" QUERY: the response to a QUERY command encodes information
in the command-specific bits:
[ 3: 0] AXIS_WIDTH_G - 1. I.e., this field encodes the
word size (minus one) used by the core. This information
is important for formatting the stream.
[19: 4] MEM_DEPTH_G. Indicates how much memory (if any) was
configured in words.
[27:20] TCK period. Encoded as
200Mhz 1
round{ log10( ------- ) --- 256 }
Ttck 4
With the special value 0 representing 'unknown'.
"01" JTAG: the response to a JTAG command is a sequence of
TDO words which form the TDO bit vector. The vector
stored in little-endian format (first bit of the vector
is the LSB of the first TDO word).
If the number of JTAG bits does not fill the last TDO
word completely then the relevant bits are right-
aligned.
If the transport mechanism contains unreliable segments with a potential for
data loss then a simple retry mechanism is not suitable because JTAG operations
are not necessarily idempotent.
The core can be configured to use internal memory (MEM_DEPTH_C > 0
) in which
case it stores the last JTAG TDO response in memory.
When the next JTAG command arrives the core inspects the 'transaction ID' field
of the command and if it is identical with the ID submitted along with the previous
transaction then the core detects a retried operation and does not actually execute
it again on JTAG but plays back the stored TDO response to the requestor.