The MMU slices memory in pages.
32bits SPARCs support 4 page sizes: 4kB, 1MB, 256MB and 4GB, these sizes are named respectively page, segment, region and context.
All these pages are aligned.
For a 4kB page, the MMU transforms bits [31:12] of the virtual address into bits [35:12] of the physical address, bits [11:0] are unmodified.
For a 16MB page, the MMU transforms bits [31:24] of the virtual address into bits [35:24] of the physical address, bits [23:0] are unmodified.
Large pages are used for the framebuffer, ROM, kernel… These areas are static. Dynamically allocating large pages is rarer, it can be useful for things like databases, it requires specific OS support.
Having large swaths of memory allocated together reduces MMU’s work (Fewer TLB misses, smaller pages tables).
When the CPU starts, the MMU is disabled and physical addresses are equal to virtual addresses. The 4GB page mode is a bit like that. Little use.
The Sparc V8 reference MMU provides a 36bits physical address space, which is larger than the 32 address bits provided by the integer unit. This MMU can map each 4kB page anywhere in a 64GB address range.
You could imagine having more than 4GB of RAM, shared between several processes, each one accessing a 2GB portion. Some large many-CPUs Sun4d servers actually featured this configuration.
Practically, for our project, the 36 address bits are not used for accessing a lot of RAM. It is available here because the SparcStation-10/20 memory map spreads peripherals over the full 64GB physical address range.
A SparcStation-5 mode was added later, it only needs 31 address bits.
As the unneeded address bits are left unconnected and are discarded during FPGA compilation, the cost of these 4 additional address bits is quite limited.
For a multitasking, multiuser OS as UNIX, it shall be possible for each process to be assigned a different memory map. SPARCs (and MIPS, ARMs) use contexts.
Contexts provide an additional indirection level, above the 4GB mapping.
Contexts are not directly related to the process IDs, as the number of hardware contexts can be lower than the number of processes. Contexts are dynamically allocated to processes. Our CPU is usually configured with 256 contexts.
The operating system kernel can access all tasks and all memory, it can also switch contexts. The kernel code must have the same mapping across all contexts, all tasks.
On SPARCs, there is no “kernel/supervisor” address space, context or memory mapping. Instead, being in supervisor mode changes the access rights and allows unrestricted access to regions out of user level software reach.
Then, it is the operating’s system role to manage virtual memory and divide the 32bits virtual address space into an application/user user and a supervisor/kernel area. RAM is usually mapped in both areas simultaneously, and when it is too large for the 32bits address space, virtual pages swapping may be necessary (see “highmem”, “PAE” issues with Linux).
(MIPS and SuperH, AVR32, … names its 8bits context register “Address Space Identifier”, which have no relation with SPARC ASIs. ARMs uses similar contexts named “ASID”. PowerPC have a “PID”: Process ID register. This is the “simple” version, MIPS32, PowerPC, ARMs MMUs are different)
Each page have a set of characteristics: physical address, access rights…they are stored in temporary buffers, traditionally called “Translation Look-aside Buffers”. As the CPU cannot internally store all translations for all pages, the TLBs are managed as a cache of the latest accessed pages.
The replacement of TLBs contents as new pages are accessed is done either by software or through hardware mechanisms. We will see that later.
Each TLB stores the characteristics of a virtual memory area. Our MMU can currently manage up to 4 TLBs for datas and 4 TLBs for instructions (which is few).
Here is the actual content of our MMU’s TLBs (mmu_pack.vhd).
TYPE type_tlb IS RECORD v : std_logic; -- Valid va : unsigned(31 DOWNTO 12); -- Virtual Address st : unsigned(1 DOWNTO 0); -- Short Translation ctx : type_context; -- Context acc : unsigned(2 DOWNTO 0); -- Access Protection bits -------------------------------------------------------------- ppn : unsigned(35 DOWNTO 12); -- Physical Page Number c : std_logic; -- Cacheable m : std_logic; -- Modified END RECORD;
- V : Validity
- VA : Virtual Address
- ST : Short Translation
- CTX : Context
- ACC : Access Permissions
- PPN : Physical Page Number
- C : Cacheable
- M : Modified
If V=1, the TLB content is valid and contains translation information. If V=0, the TLB is empty.
The high significant bits of the virtual address are compared to this entry to determine whether the TLB matches the transfer.
For 4kB pages, addresses [31:12] are compared. For 256kB pages, addresses [31:28] are compared, etc.
Page size: 00=4kB, 01=1MB, 10=256MB, 11=4GB.
Context number. Default is 8bits. The context register is compared to this record. For supervisor pages (see ACC below), it can be ignored as the kernel resides at the same address across all contexts.
Indicates the allowed accesses for the page. See SparcV8 standard, page 248.
|0||Read Only||Read Only|
|4||Execute Only||Execute Only|
Types 6 and 7 indicates supervisor pages and are context-independent for the MMU (and cache).
This is the physical address associated to the page.
For selecting between cacheable and non cacheable accesses.
This bit is set if the page contents has been modified. This is unrelated to the state of cache lines (which have a Modified state in write-back configurations)
TLBs are made of three parts:
– V/VA/CTX/ST is used as content addressable memory. The MMU matches these records with the current memory access.
– ACC contents will determine if the access should be authorised or not.
– PPN/C/M generates the external access: physical address, caching.
The comparisons between the Virtual Address and each TLB can have several outcomes:
- The TLB is invalid (V=0), no information can be taken from it.
- The TLB Virtual Address (VA) mapping does not include the pending address.
- The TLB Context (CTX) does not match the current context.
- The TLB matches the address. The access control information indicates that the access is forbidden (privilege violation: no execute, no write, supervisor only…).
- The TLB matches the address. The area was never written and it is a write access (Modified Bit).
- The TLB matches the address. The access control information indicates that the access is authorised.
➜ Ignore the TLB
➜ Ignore the TLB
➜ Ignore the TLB
➜ Trigger a trap “Protection fault”. There are different traps for data and instruction faults.
➜ Update the page table in main memory before continuing. Update the TLB.
➜ Use the TLB content to generate the physical address, continue.
Obviously, for the CPU to go forward, the last case should occur with one of the TLBs for 99% of accesses.
IF no TLB match, the MMU cannot immediately decide what to do. It must find elsewhere that information. Selecting which TLB shall be updated is a problem as well. It wil be covered in following posts.
Our TLBs are either empty or contain valid address translation information. They cannot indicate that a memory area is not mapped or disabled.
All of the CPU accesses go through the MMU, even those which correspond to I/O devices. These special resources are generally only accessible to the operating system, don’t change their address often and are marked as non-cacheable.