There are many circumstances where a CPU may halt and jump to an exception vector:

External Interrupts, triggered by a peripheral (e.g. IRQ signals).

MMU traps, a data or instruction access necessitates an intervention from the Operating System.

Software triggered traps: Ticc instructions, essentially used by user level code for calling the Operating System: “SYSCALL”.

All sorts of errors like “divide by zero”, “illegal instruction” or “FPU disabled”

Window Underflow and Window Overflow traps. Unlike the other types, which are encountered on many CPU families, those traps are quite SPARC specific.

All these ‘events’ are handled by the same hardware which stops the CPU, saves and updates some registers, then resumes. (there is yet another thing which reuses that hardware, it will come later…)

Please read SPARCv8 standard chapter 7. Come back when you have done it.
Quick! I am waiting.

Detection

Trap conditions are detected at different levels of the pipeline, they are finally managed at the WRITE level.

FETCH

DECODE

– The MMU answers to the instruction fetch access with a failure : MMU instruction fault.

EXECUTE

– The instruction is invalid: Unknown opcode…
– Division by zero
– Misaligned accesses
– Privilege violation errors: A supervisor instruction is executed by user code.
– Software traps: Ticc instructions.
– External interrupts.
…others

MEMORY

WRITE

– The MMU answers to the data memory access with a failure : MMU data fault.

The pipeline stage order has some importance. For example, if a misaligned access is calculated in the EXECUTE stage, it will not be attempted in the MEMORY stage.
External interrupts can be handled at different levels of the pipeline; it changes a bit interrupt latency. They must be handled before the MEMORY level to avoid executing and cancelling any instruction with side effects. One could alternatively check that no data access is active before acknowledging an external interrupt.
Software triggered traps Ticc instructions could be implemented as special branches but the machinery for saving and modifying both PC and nPC is a bit tricky to fit in this scalar pipeline.

Processing

When a trap condition eventually reaches the final WRITE stage (pipe_wri.trap.t=1), the trap machinery is activated (trap_stop): All the instructions in flight are flushed and PC and nPC saving pseudo-instructions are placed into the pipe. New PC and nPC values are set to the trap vector address.
PC is the Program Counter and nPC is the Program Counter for the next instruction. They are not contiguous when the CPU is about to branch, in the delay slot, this is why both must be saved and restored. (CPUs without a branch delay slot: x86, PowerPC, ARM, only have a PC register)

Register windows

When a trap occurs, the current register window pointer is decremented (like the SAVE instruction); there is therefore no need to immediately save many integer registers to memory: 16 LOCAL and 16 OUT registers are readily available.
The operating system must always keep a register window free (using an appropriate WIM mask) for that purpose.
The WINDOW_UNDERFLOW and WINDOW_OVERFLOW traps are a bit special as they are triggered when only one register window is available, the OUT registers, shared with the next window, must not be trashed.
As traps cannot be nested, the exception routine must not trigger WIN_UNDERFLOW or WIN_OVERFLOW traps by using SAVE and RESTORE instructions.

Interrupt sequence

The CPU does the following operations when processing an interrupt:

The interruption is detected at some level of the pipeline.

As the trap flag (pipe_xxx.trap.t) is active, it inhibits current and following instructions.

Fetches are halted and the CPU waits for the completion of current instruction fetches, emptying the PLOMB pipes.

The pipeline is emptied (pipe_xxx.v=0)

The register window pointer (CWP) is decremented.

The floating point unit is asked to cancel instructions still present in its pipeline.

In order to save the PC and the nPC registers, two register move instructions are inserted in the pipeline at the MEMORY and WRITE levels. As soon as the CPU resumes execution, it will execute these instructions and save the registers.

The PSR flags are restored to the value they had at the end of the latest completed instruction, then the PSR.PS, PSR.ET, PSR.S flags are updated. RY is recovered as well.

The fetch queue is filled with two accesses to the trap vector address. PC and nPC are updated.

Execution resumes.

Remarks

The PSR.ET flag enables or disables external interrupts. It is automatically cleared when the CPU enters a trap and restored at the end by the RETT instruction. Interrupts are masked while ET=0. If a « synchronous trap condition » occurs while ET=0, for example an illegal instruction, the CPU enters a fatal error mode and waits until the RESET is asserted. (It is important for kernel code, trap vectors must not trigger MMU faults, for example)

The ET flag can be set by a WRPSR instruction at the same time as an external interrupt occurs. In that case, the pipeline must be tuned to either update the ET flag before checking external interrupts or ignoring these interrupts while WRPSR is still in the pipe: The PSR register is normally updated at the WRITE stage but interrupts are acknowledged before. See SPARC standard page 134 about the ET flag. (Well, of course, I made that error…)

Like all traditional RISCs, SPARC instructions are indivisible, even the ones that do several accesses: LDD, STD, SWAP, … As memory accesses must be aligned, at least for the MMU, if the first memory access is successful, the second one should be as well. PIPE5 only checks the first access but completes both.

Annulled instructions traverse the pipeline and are not simply discarded during the decode stage because they alter PC/nPC values (A previous version of PIPE5 discarded early annulled instructions but it needed to transport both PC and nPC at each level of the pipeline). These instructions cannot trap: the annulation flag is not saved. The MMU could trigger a page fault for an annulled instruction; this fault would be ignored by the IU.

Conclusion

Some CISCs CPUs like the MC68000 family have very complex interrupt save states as instructions with indirect memory addressing, for example, can generate several accesses that may fault independently. The CPU must be able to resume partially executed instructions and store in the stack detailed machine state. Insane.

Saving only PC, nPC and a couple of flags without doing any memory access is really a piece of cake.
On a simple architecture like PIPE5, few instructions are simultaneously in the pipeline and trap processing wastes little time (actually it could be optimised). On more advanced CPUs where tens of instructions are simultaneously at different levels of execution, a trap can waste hundreds of cycles. As CPUs become more complex, traps and interrupts should be minimised, for example by using intelligent peripherals which requires few CPU interventions, by using fast system call methods, by developing kernels and schedulers which do not need periodic timer interrupts (tick-less)…

For PIPE5, data accesses are all deterministic, firm, non speculative. Before starting a new access, the CPU checks that the previous instruction has completed without triggered a trap. Both read and write accesses can have side effects, particularly for memory mapped I/O registers.
Instruction accesses are a bit less strict: When a traps is handled at the write stage, up to 4 extra instructions have been accessed from memory. This is acceptable because:

Instructions are read only.

Instructions are executed from RAM or ROM, not funky I/O ports.

Code segments are known in advance. The extra accesses could trigger page faults on the MMU, but they are still part of the code area, even for JIT generated code.

We have not completed yet our exploration of PIPE5. More is coming.

TEMLIB

The TEM library

IU : Pipelined : Traps