Things you ought to know!
The PLOMB bus is fine but, sometimes, there are long combinatorial paths between initiators, targets, multiplexers and selectors. To cut these paths, one can place DFFs across the busses. Obviously, that adds latency but, with the help of pipelining, burst transactions and interleaved accesses between several initiators, throughput can be augmented.
We place a lock on the canal:
The initial goal was not to store in a FIFO many pending accesses (although being able to store a burst can be useful sometimes) but to break combinatorial paths. The smallest possible FIFO has one cell and FIFO.full=NOT FIFO.empty. We get something like that:
Using a two cells FIFO, the bus can be used efficiently:
Because of the use of sequential elements, the feedback path (ACK) has a one cycle delay after the forward path (REQ & data). That delay is mitigated by being able to store up to two accesses. In the figure above, “D4” stall on the MEM side generates a stall on “D6” on the CPU side.
With a three cells FIFO, it is possible to hide one stall cycle:
With a combinatorial path on ACK, you save one FIFO level:
- COMB: FIFO with combinatorial path. Configurable depth.
- SYNC: FIFO without combinatorial path. Configurable depth.
- DIRECT: No FIFO, just wiring.
There are many many small FIFOs in the design.
Apart from the Ethernet MAC which is able to store a 64 bytes Ethernet frame and use a dual port memory with read and write pointers, most FIFOs are much smaller, sometimes only 1 or 2 cells, sometimes deep enough to store a whole burst transfer.
These FIFOs are usually implemented like that:
IF push FIFO<=datain & FIFO(0 TO DEPTH-1); END IF; IF push and not pop IF lv='0' THEN lev<=lev+1; END IF; lv<=’1’; ELSIF pop and not push THEN IF lev/=0 THEN lev<=lev-1; ELSE lv<=’0’; END IF; END IF; dataout<=FIFO(lev); fifo_is_not_empty <= lv;
This is called sometimes a variable length shift register.
There are many other ways to implement a synchronous FIFO, this version is specifically tailored for Xilinx FPGAs, using SRL16 primitives:
These primitives use LUTs in a special way to assemble a 16bits shift register with a multiplexer. With that stuff, a 2-deep FIFO has the same area as a 16-deep FIFO.
I do not know how it is synthesised on Alteras, Lattice and others, and if alternate implementations could be preferable. To the limit, they can be implemented as discrete FFs which is usually better than true dual port memory for very small FIFOs.
VHDL2008 introduced generic types, allowing the creation of generic FIFOs. For now, the langage used is still VHDL’93 and each FIFO is described separately…
Other small FIFOs and buffers:
- IU: Stores fetch accesses, the PC+nPC pipe.
- IU: Stores data read back on the instruction and data busses
- MCU: Buffers external accesses from the instruction and data busses. Write back buffer.
- PLOMB_MUX: Stores port numbers
- PLOMB_SEL: Stores port numbers
- LANCE: Stores/prepares burst transfers
- LANCE: Data read/write storage
- ESP: Memory transfers
- VID: Burst transfers and cross clock domain buffers.
Beyond the PLOMB_FIFO case, understanding this concept about flow regulation and pipelining is essential. I first encountered this problem a long time ago, when trying to implement a PCI [parallel] interface: You need a FIFO and a way to hold data when the current access is eventually delayed or cancelled.
(A lecture somewhat related to this subject : http://www-inst.eecs.berkeley.edu/~cs250/fa10/lectures/lec07.pdf)