Implementation

This section covers the actual details of the implementation of the system in hardware.

The DOGMA tree (dog links)

Physical layer

The dog link serial transmitters and receivers are implemented in the FPGA’s fabric using normal I/O pins (not hard implemented multi-gigabit transceivers (MGTs)) to have full control over the details of the bit transfer with fixed latency and to be able to directly access the raw incoming serial signal for clock recovery without intermediate hardware (e.g. a signal splitter) in the data path. To achieve the beforementioned goal of transmitting a low-jitter clock together with trigger and control signals on one optical, serial link, we came up with a custom encoding scheme that draws inspiration from existing protocols which use duty cycle/ pulse width modulation of clock signals (for reference, older publications about clock-edge-modulation can be found here: https://ieeexplore.ieee.org/document/4295185. ).

The initial idea was to transmit an actual clock signal and only vary the duty cycle to encode data (as has been done before). However, due to the use of optical links, this requires encoding of the transmitted data to ensure that the data is dc-balanced. Tests using 8b10b encoded data revealed that the dc-level variations allowed by the 8b10b encoding still had quite an impact on the jitter of the recovered clock at least with the PLLs tested for clock recovery (a small, inexpensive and low-power PLL is needed for the use on front-end modules).

To improve on this, we came up with our own encoding scheme to meet our design goals (easy clock recovery, low rxclk jitter, ease-of-use, bandwidth).

Dog link protocol

The current dog link data encoding scheme defines a set of 9 different bit patterns (symbols) that can be transmitted in random succession. Since we want to be able to principally transmit arbitrary information (trigger numbers, 32bit/64bit timestamps etc.), we need a communication protocol based on this set of symbols (let’s call them S0-S8).

The current dog link protocol uses the symbols S0-S7 to encode 3 bits of data, while the symbol S8 is used as a control symbol (EOT/IDLE). The link decoding statemachines of the receivers are reset to the idle state whenever an EOT/IDLE symbol is detected. This ensures that the receiver cannot be permanently affected by incoming corrupt/ bogus transmissions and will always attempt to decode the succession of symbols coming in after the last received EOT/IDLE according to the protocol (stateless/ self recovering receiver).

Downlink (DCM -> Dogs)

The downlink protocol treats the first two incoming symbols as trigger identifier and the symbols thereafter as trigger payload. For technical reasons, symbols S8 and S7 are sent alternately as IDLE sequence. Therefore, symbol S7 cannot be the first symbol of a trigger transmission which means that there are only 7 options for the first symbol of a trigger transmission, yielding a total of 56 possible different trigger types when two symbols are used as identifier.

Uplink (Dogs -> DCM)

The uplinks are using the same protocol, except that only the first symbol is currently used as trigger identifier. So only 7 different trigger types are currently defined on the uplinks.

GbE interface

All modules in the dogma gbe network (= “dogs” ;-)) are listening for UDP frames on port 60678 (0xED06). Since the internal registers and the control bus of the dogma modules are 32 bits wide, the control protocol is also built on tuples of 32 bit / 4 byte words. Incoming UDP frames are checked to satisfy the minimum length (in bytes) demanded by the protocol (exact number still tbd, because protocol not fully defined, yet) and to have a length which is a multiple of 4 bytes. If one of the conditions is violated, the frame is discarded unprocessed.

Address Scheme

Each FEE (and each intermediate HUB-stage) in the system has its own Network-Address. This is a number which can be set via the control commands.

For context: This is independent on the IP Address. Each board has a unique serial address (via an equipped device on the board, normally an I2C device). The unique id will be mapped to a MAC address which then will be used to assign an IP via DHCP. Therefore, if accessed via IP, the Dogma-network-address is only needed if the FEE is accessed via IP-multicast/broadcast.

In the network address space, there is one broadcast address, to send a command to every entity and many multicast addresses. Each board can have via the gateware one or many multicast addresses. If the multicast bit in the address is set, all boards with a matching rest of the address field will react on the command.

Additionaly, there are registers in the address space of the Dogma Entity (endpoint) which then also are used as multicast addresses. This allows to determine dynamically (during runtime) many subset of the system which can ease the work with the whole system, e.g. reprogramming the flash of a subset of the whole system.

Example:

Address Scheme
Address (0x)	purpose
ffXXXXXX	broadcast to all DOGMA-entities (X = arbitrary)
fe000000-fefffffe	multicast to all DOGMA-entities with at least one of the lower 24-bits set
fd000000-fdffffff	multicast to only one type of board, so they react if lower 24-bits match
fc000000-fcffffff	multicast to all boards which match the lower 24-bits or match one of the 24 bits of the 0xfe multicast type
fbffffff-f0000000	reserved

Request protocol:

Each dogma control frame starts with a header of 12 bytes which contains 4 magic bytes (0xECC1701D) to prevent unintended access, the 32 bit dog address and a reserved word to avoid padding in the UDP paket. The dog address is checked on reception to allow different addressing schemes.

Bytes within a frame following after the header are interpreted as commands. The first 2 bytes are the command ID which is set by the sender and needs to be included in the response. After that, the 2 byte access type follows which can in principle (re)define the the protocol for the rest of the command. For the standard access types (r/w, r/w multiple) another word of 4 bytes follows, defining the number of reads/writes (10 bits should be enough) and the 16 bit internal (start) address. After that, n time 32 bit data words need to follow for write commands (where n is defined in the 10 bit length field).

Control access frame header:

Control access header
Bytes	Function
0-3	Magic bytes: 0xECC1701D (can also be used to encode protocol version)
4-7	32 bit dog address
8-11	reserved

Control access frame command format (multiple commands per frame possible):

Control access command format
Bytes	Function
0-1	16 bit command ID set by sender
2-3	16 bit access type (see below)
4-5	16 bit internal (start) address
6-7	6 bits unused + 10 bit length field for multi r/w operations
8-7+n*4	n 4 byte data words for write operations (n defined in length field)

Accesss type encoding:

Access type encoding
Bits	Function
3:0	address modifier: 0==illegal 1==standard access (as described in the following of this table) 2 to 6==advanced access (The meaning of the next bits will change, according to the number. To be defined.) 0xf==delay access (allows timed accessed within a multicommand frame)
4	read/write bit: 0==read 1==write
5	fifo access mode: 0==internally increase the address for each r/w access to r/w a block of addresse 1==read/write n times on the same address)
11:5	Don’t care bits (room for future features)
15:12	DOGMA version (0 = current, all other = reserved)

As a new command, address modifier 7 allows to add arbitrary delays into a multicommand frame, for example to start an I2C access, wait for the result, and read it back immediately. This effectively implements a lock on a hardware resource, as no other command can interfere here.

So a single write command consists of 12 bytes (8 bytes command, 4 bytes data) and a single read command consists of 8 bytes. The dogma header adds another 12 bytes. Taking into account that a UDP frame has an overhead of 46 bytes and that an ethernet frame needs to be at least 64 bytes long, it is clear that transmitting single commands over ethernet is very inefficient. Therefore, it needs to be possible to pack multiple control commands into one dogma frame.

To achieve this, the interpreter will process the first two words after the header as a command. If it is a read command (no following data words necessary), it will either end the processing of the frame if the last byte/ end of frame bit is set, or process the next 2 words again as a command word. If a command word turns out to be a write command, the n following words (n from the length field in the command word) will be interpreted as data to be written and once n data words have been processed, the next word will be processed as a command word again. At any time within the processing of the frame, a set last byte/ end of frame bit will end the processing. Likewise, any violation of the protocoll (command word format) will end the processing and the fifo will be purged until a last byte/ end of frame bit is detected or the fifo is empty.

Execution of commands will stop in case of protocol errors (like an unknown address modifier, missing write data at the end of the multicommand structure).

A valid multi-command control access frame looks like this:

Multi-command frame structure
Words (32bit)	Content
HEADER	magic bytes
HEADER	dog address
CMD1	write with length 1
CMD1
DATA1	32 bit data
CMD2	read with length n
CMD2
CMD3	write with length 2
CMD3
DATA1	32 bit data
DATA2	32 bit data (end of frame bit set)

This way, easily 1k single register writes with arbitrary address or 2k reads or block writes can be packed into one GbE frame staying within the common MTU of 9000 bytes.

Answer protocol

Each command in the request (read/write of n words) will result in one UDP-packet sent back to the requester (the MAC/IP is known by the dog via the arriving UDP-packet, this simplifies the setup procedure). Separating each command in one UDP-packet makes the protocol easier, as no length information needs to be put in the protocol and more important no mismatches of length and data needs to be checked and discarded in a longer sequence of answers in one UDP.

The disadvantage is the higher overhead. But in the end, experience from real world large systems have shown that the total used bandwidth generated by controls-information is quite low (~1MByte/s). As we use by default GbE from each dog the used bandwidth per dog is in nearly all cases completely negligible. The following information is needed/wanted in the answer:

Information

Length

Comment

access type

16 bit

each bit for different properties of the access, e.g. read/write, fifo-bit, protocol extensions (e.g. certain bits can redfine the whole following data structure)

command sequence

16-bit

each command/request is sent with a number generated by the controls master, which has to be returned here

dog address of sender

32-bit

to be able to know where the data came from

access time

16-bit

absolute wall clock time (no high granularity is needed), e.g. to calculate accurate rates

result bits

16 bit

result: e.g. successful write/read, status flags, error bits

n-data words

32 bit * n

n=0 for write accesses

One response frame per request frame with the following format:

1 x Frame header (fixed length):

4 magic bytes = 0xECC1701D

4 bytes dog address

4 bytes time stamp (counting up with 125 MHz)

1 Command response block per valid command in the frame (frame discarded after protocol violation):

1x Command response block header:

2 bytes command ID

2 bytes access type

2 bytes (start) address

2 bytes length -> defines n

n x return data blocks (fixed length: 4 bytes for write, 8 bytes for read, 4 bytes for delay):

2 bytes time stamp (always) (lowest 12 bits of ctime discarded -> T=32.768us)

2 bytes return value (to be defined) (always)

4 bytes data (only for read commands)

Return value encoding:

2 byte return value encoding
Value	Meaning
0x0001	Standard access r or w successfull
0x00E0	Standard access r or w ERROR (address out of range)

Ping of Death (PDD)

The GbE core has some low level command interface to be able to reset the device via simple ping commands. The POD resides on the RX data path of the uplink MAC and can’t be blocked by internal issues.

Several actions are supported by the POD:

Reboot FPGA: done by command byte ‘0xa5’
Cold Reset: done by command byte ‘0xb4’
Warm Reset: done by command byte ‘0xc3’
Clear TX locks: done by command byte ‘0xd2’, needs one argument byte to specify channel to unlock.

The POD can be sent by normal ping command (may need sudo rights in case of broadcast pings), and a special payload needs to be sent. The payload argument needs to be 32bit aligned, otherwise some implementations of ping will drop the last bytes.

A magic word ‘0xabad1dea’ is the first part of the payload. the second word carries the command byte and (if applicable) the argument byte.

Example:

ping 10.1.1.110 -p abad1deac3000000 -c 1 -W 1

This POD (sent to 10.1.1.110) will execute a warm reset (command ‘0xc3’). The option ‘-c 1’ sends only one ping packet, and ‘-W 1’ limits the waiting time for the (never arriving) answer to 1 second.

Caveat: a broadcast POD may or may not work at once. In case a module in the root area of the tree structure is affected, the first ping will reset this one, but not necessarily the following modules. If you need a reliable POD in the tree structure, you should check BlackCats DMS (DLM messenging service) implementation.

DAQ System

This section (hopefully) covers the necessary details of the implemented triggered data acquisition system.

Implemented trigger functionality

With the current dog link protocol, 56 different triggers (numbered 0-55) can be transmitted downstream (DCM -> DOGs). So far, every trigger transports 64 bits payload.