SPARC CPUs use condition codes, the famous NZVC bits (Negative, Zero, ‘Verflow and Carry).
SPARC flags are arguably based on the MC68000’s. They are nevertheless far better used than in MC68000s as their update and use is optional :
|SUB||Do not modify flags|
|SUBcc||Modify the NZVC flags. CMP R1,R2 == SUBcc R1,R2,R0|
|SUBX||Use the carry flag as a carry-in.|
|SUBXcc||Use the carry flag then modify the NZVC flags.|
Condition codes are almost exclusively used in combination with conditional branches. On modern CPUs (not ours!), extended precision arithmetic is often best served by the multimedia SIMD instructions. And, 64bits CPUs do not need extended precision arithmetic very often. Another use is some idiomatic forms for things like shifts, calculating the absolute value, or the sign of an integer. Specific instructions and conditional moves can be used instead. As condition codes are invisible to high level languages, their explicit use is often limited to assembly code.
Besides MC68000s and SPARCs, condition codes are used by most of the current dominant Instruction Set Architectures: x86, ARM, PowerPC. Some other ISA, influenced principally by MIPS, do not use condition codes, they provide instead various alternatives like combined compare and branch instructions or store comparison results in regular registers. Modern evolutions of old ISA, for example the 64bits ARMv8, also introduced some alternatives to the use of condition codes.
Back on SPARC
The CMP “compare” instruction works both on unsigned and on signed numbers. In a sense, CMP does simultaneous signed and unsigned comparisons.
From the 65536 possible flag combinations, 16 are available for integer conditional branches:
|BG||Greater||!(Z + (N^V))|
|BLE||Less or Equal||Z + (N^V)|
|BGE||Greater or Equal||!(N^V)|
|BLEU||Less or Equal Unsigned||(C+Z)|
|BCC, BGEU||Carry Clear = Greater or Equal Unsigned||!C|
|BCS, BLU||Carry Set = Less than Unsigned||C|
The MC68000 have the same branches, copied from the 8bits MC6800. ARM, x86 are very similar as are many 8 and 16bits CPUs. PowerPC is more complex (but more interesting).
(65536? We have 4 flags, so 16 states. We could test any combination of states, so 2^16 combinations)
The FPU also have condition codes, set by the FCMP instruction. The comparison result is a two bits value: Equal / Lower / Higher / Unordered. Floating point calculations are always signed, there is no need to have different flags for signed and unsigned numbers. There is instead special encodings, named “NaN”: Not A Number, used to signal impossible results (like sqrt(-1), 1/0). Comparisons involving NaN shall return “Unordered”.
The two bits FPU flags provides 16 possible branches :
|FBUG||Unordered or Greater||.||.||X||X|
|FBUL||Unordered or Less||.||X||.||X|
|FBLG||Less or Greater||.||X||X||.|
|FBUE||Unordered or Equal||X||.||.||X|
|FBGE||Greater or Equal||X||.||X||.|
|FBUGE||Unordered or Greater or Equal||X||.||X||.|
|FBLE||Less or Equal||X||X||.||.|
|FBULE||Unordered or Less or Equal||X||X||.||X|
As languages like C does not handle directly NaN, handling the “unordered” state is problematic. “Branch on Less or Greater” is different from “Branch on Not Equal”.
This distinction stems from the IEEE P754 floating point standard.
Problems with flags
Most current dominant CPU instruction set architectures use explicit condition code bits. There is no incompatibility between the use of these flags and good CPU design. Nowadays, they are nevertheless considered as a hindrance for several reasons.
Flags are generated after normal ALU operations, ADD or SUB, AND… Some flags can be calculated quickly, for example the N bit which copies the result’s MSB, others depend on all bits of the result, for example the Z flag.
On a CPU with only a “compare and branch” instruction, the conditions can be calculated more efficiently: For example the Z flag can be calculated by XORing both inputs instead of checking that the result of the subtraction is equal to zero.
Calculating whether the result of a subtraction is equal to zero is useful, for an addition, not so much, SPARC, like PowerPC or x86, have both ADDcc and SUBcc.
The very common sequence:
CMP Ra,Rb Bicc ...
…stresses a challenging critical path on simple CPUs like PIPE5. The CMP result is calculated at the EXECUTE stage and is immediately applied to the Bicc instruction in the DECODE stage, during the same clock cycle (unless you accept that the branch takes several cycles). More advanced CPUs have speculative prefetch units not directly dependant on the execution result, but any prediction miss costs many cycles.
CPUs must be able to handle quickly small loops which happen very often. Because of the delay slot (ADD in the example above),there is often no available buffer instruction to place between the compare and branch, to give extra time for settling the flags.
- Superscalar and speculation
Flags create dependencies between instructions, which can make things more complex for wide superscalar and speculative CPUs.
Most instructions should not modify flags nor depend on their value, on PowerPC, SPARC (& others), many ALU instructions have several variants, with and without flags. x86 are much less clean, but many operations are doable with and without flags updates (x86 have also crazy flags like decimal or parity). MC68000 are awful, as even register moves update flags.
To enable more concurrent execution of flag modifying instructions, SPARCv9 offers several sets of condition codes for floating point comparisons and PowerPC offer them for both integer and FP. It may have been reasonable for highly parallel FP bound programs before the advent of SIMD instruction sets. It proved mostly useless on PowerPCs. Advanced speculation and branch prediction can often replace static scheduling.
Alternatives to flags
The main alternatives to flags used for conditional branches are:
- Branch on a register value compared to zero (Alpha, MIPS, Microblaze, PA-RISC). Any integer register can be used. A subtraction can be used as a comparison to set the register.
- Combined compare and branch instruction. Because of limited opcode space on fixed instruction width RISCs, the comparison is usually between registers, no immediate value. (NIOS, MICO, PA-RISC)
- Update general purpose register by setting 0 or 1 after a comparison (NIOS). This comparison behaves very much like any ALU operation.
- Use a single flag for comparisons and branches (OpenRISC). Who can like it?
Many CPUs also provide conditional move instructions (which have their own problems, particularly in OoO architectures…)
Finally, an original use of flags can be seen in the Chinese Loongson3 CPU. As a MIPS compatible CPU, it needs no flags, they nevertheless added a flag register and special instructions for accelerating emulation of x86 code.