The debugger controls the CPU through a serial interface.
The debugger can:
- Start and Stop the CPU
- Set Data and Instruction breakpoints
- Read and Write registers
- Read and Write memory
The debugger controller (
iu_debug) is directly interfaced with the CPU IU debug ports (
On the SP605 board, the serial port is shared for the terminal console and the debugger (
ts_aciamux.vhd). Input selection can be done either by using the CTS signal (0=serial, 1=debug), or by using the special BREAK ‘character’ (
BREAK is an invalid sequence where the serial line is kept low for longer than the duration of a character. Using a special state for switching allows normal communications using any of the 256 possible characters. The downside is that it requires using some ioctl() calls which don’t seem to work very well (this is why “34” and “33” were eventually chosen, because these codes have several transitions, more than “00” and “01”, for example).
The CTS method is more reliable, but this signal is not always connected.
Note that, characters emitted by the software while the serial port is in debug mode are lost.
IU_DEBUG is connected to an ACIA (
acia.vhd), transforming the serial line into bytes.
There are COMMAND sequences for sending a 32bits value and STATUS sequences for receiving a 32bits value:
|CMD_OP (opcode)||Sends an opcode then execute it|
|CMD_CMD (value)||Sets various discretes|
|CMD_IB (address)||Sets the instruction breakpoint address|
|CMD_DB (address)||Sets the data breakpoint address|
|STAT_PC →address||Get the PC register|
|STAT_nPC →address||Get the nPC register|
|STAT_DATA →value||Get the instruction data result|
|STAT_STATUS →value||Get various signals|
The discretes controlled through the debugger are mainly CPU run/stop, breakpoint enable/disable and PC update/fetch.
How can the debugger control the CPU with only these commands?
Hardware debuggers provide remote control of a CPU. They must comply with contradictory requirements:
– Make accessible and modifiable as much as possible of the CPU state: All the integer, floating point and special registers. Here we have Y, PSR, WIM, MMU registers, cache contents …
– Be always available, the debugger has “über-privilegied” access, ready for user, supervisor and even the ininterruptible trap code.
– There must be a way to reach all addressable resources: Main memory, peripherals.
– As small as possible.
– Minimal impact on the CPU performance and operating frequency.
To reach all the CPU registers from an external probe, you can use additional R/W ports and multiplexers on every resource. It is a very intrusive, area consuming solution. That is not the solution chosen for our cores.
The principle is instead to halt the CPU, then push debugger-provided instructions into the pipeline. These instructions are processed like every “normally ingested” instruction, except that the program counter is not updated (except when the debugger wants to update the program counter) and the instruction fetches are halted (except when…). The instruction results are sent back to the emulator interface by scooping the value written into the destination register.
With that method, all the software-visible processor state is accessible from the debug interface. While in debug mode, all the privilegied supervisor mode instructions are enabled, it could even be possible to add special instructions only available from the debug mode.
To reach memory, one just have to make the CPU execute load or store instructions.
A drawback is that accessing the CPU state this way is intrusive and prevents real time operation.
Some registers are nevertheless directly copied to the debug port: PC, PSR and permit ‘live’ monitoring without halting the CPU.
Putting safely the CPU into debug mode is a bit tricky as one want to halt execution then be able to resume it later from the exact same place. The debugger must be able to revert all the changes it does to the CPU state.
The solution chosen here is to handle it a bit like a trap. The trap logic is reused, the pipeline is flushed but the registers are not updated, contrary to normal traps. (e.g. PSR.S, PSR.T, PSR.CWP as well as saved PC and nPC in registers R17 and R18). At the end of the pipeline flushing mechanism, the fetch logic is stopped and a multiplexer steers instructions from the debug interface. At the end of the debugging session, the debugger must restart instruction fetches by pushing a pair of JMPL instructions into the pipeline (The debugger may want to resume execution, or move to anywhere else), a bit like the trap exit sequence, then deactivate the debug mode.
Contrary to software debugging, as it doesn’t modify supervisor registers, the hardware debug port can take control of the CPU at any time, including during interrupt routines.
Instruction and data breakpoints are also trap-like, but they automatically stop the CPU and enable the debugger instead of branching to an exception vector. They are currently quite limited: Only one of each type, both read and write accesses, 32bits.
When entering debug mode, the PC, nPC and PSR registers are recovered through the debug interface and the R1, R2 and R3 registers are saved for later use as temporary variables.
When exiting debug mode, the following code sequence is pushed by the debugger:
JMPL R1,R0, [With update PC and fetch]
JMPL R1,R0 [With update PC and fetch]
(See the DEBUG software, lib.c,
Yes indeed, that’s quite long !
To handle single stepping, the debugger detects branches, evaluates the integer or floating point condition codes and the possible annulation of the next instruction so that only one hardware breakpoint register is necessary. To the contrary, when debugging with GDB using software breakpoints (TA x01 instructions), breakpoints are placed on both possible following instructions after conditional branches.
For each instruction executed single-stepping, the debugger must execute the 8 ops prologue, the 15 ops debugger epilogue, and get the PC, nPC registers directly. As all that stuff is transmitted trough a serial port, debugging is not fast. Another issue is the use of USB-serial adapters which tend to buffer accesses.
Even dumping memory requires sending many instructions (SETHI/OR/LD), it is not fast.
A few details:
- The “STOPA” signal is used for halting timers and other time-dependant parts when the CPU is halted. Many other parts are not affected though: The real time clock still tocks, the video controller continues to read memory…
- Special registers are accessed with the usual instructions: WRWIM, RDY…
- MMU registers are accessed through the normal ASI codes used by the MMU. The MMU Fault Status Register is replicated in a do-not-modify-on-read address. Although undefined in the standard, many actual implementations provide this access, I guess they came through the same issues…
- FPU registers are tricky as the integer unit cannot directly read or write these registers, one must pass through a 32bits memory buffer. It can be in main memory or in the MMU/Cache controller, the “Load/Store Floating Point into Alternate Space” instructions from SparcV9 set were added precisely for that purpose: Accessing the FP registers without needing any memory access beyond the MCU : Compatible with all OSes.
When the debugger is used on MMU-less hardware, the buffer should be found elsewhere and the debugger modified accordingly.
- MMU faults have higher priority than the debugger breakpoints, to avoid halting the CPU into non mapped memory.
- The breakpoints are managed by the IU, so they use virtual addresses. MMU breakpoints would be needed for handling physical addresses.
- Finally, the severely broken trap return sequence JMPL/RETT, is just as ugly for debugging and single stepping. As the two instructions are interdependent, a breakpoint can’t be placed on the RETT instruction.
The debug software in src/soft/debug is made of several stages : (from lowest to highest)
– Communications on the serial port : serie.c
– Management of the hardware debugger : lib.c
– Monitor commands : command.c
– Terminal mode, initialisation : main.c
(plus the disassembler in disas.c)
All the important stuff is in lib.c: Procedures for reading and writing memory, registers, halt and restart the CPU…
This way of implementing hardware debug complies with all the original requirements: Minimal size and complexity (on the hardware side), can be disabled by configuration…
AFAIK, this “deposit an instruction into the pipe” method is used in many other CPUs. It is a bit hard to get details though as the debug interface is often poorly documented or under NDA. Or there is simply no hardware debug.
[Exemple of a ‘real’ debug interface: Read the Freescale PowerPC e200 reference manual. The Instruction Register (IR) content can be changed, a bit like our debugger: “The instruction register (IR) provides a way to control the debug session by serving as a means for forcing in selected instructions and causing them to be executed in a controlled manner by the debug control block.”]