Now it’s time to describe the memory management unit and cache, glued together in the “MCU”.
Let’s begin with the MMU.
The memory management unit is responsible for transforming the addresses on the IU busses into addresses for the external hardware resources. The MMU shuffles addresses, not data.
A simple design can do without any MMU. The addresses generated by the IU are then directly used for selecting the hardware resources: RAM, FLASH, IO ports… This is sometimes extended to a Memory Protection Unit that checks which accesses are valid. For example, if an IO port is reserved to supervisor code, user accesses would trigger exceptions.
A real MMU is a peculiar beast; it addresses the needs of multitasking, multiuser operating systems like UNIX, using unsafe programming languages like C and assembly.
MMU roles are:
- Provide memory protection
Each application can have a separate chunk of memory and cannot alter without authorisation the memory of any other application (due to bugs or malware). The application software is untrusted, only the operating system kernel is [supposed to be] reliable and may be allowed to access all the memory and I/O resources.
- Provide access control
A memory area can be write-protected, or code execution can be prevented.
- Provide memory remapping and relocation.
Instead of placing different applications at different offsets in a shared memory map, the MMU can remap any application address to any physical address. This mapping can be arbitrarily chosen, independently from the hardware memory map that can even be non contiguous (Real SparcStations often have a non-contiguous RAM area, depending on the size of each DRAM module installed in the computer. At boot time, the firmware probes installed memory and builds a memory map which is later used by the OS).
- Provide mirroring
For example, dynamically loaded libraries can be used simultaneously by several applications while preserving segregation between them. The MMU can map the same physical memory into different places in the address space of several processes. It is also used for providing shared memory communication between any given set of applications or with the kernel.
- Provide demand based paged memory
The total memory used by all the applications can exceed the actual physical memory available. The remaining area is “virtualised” on the hard disk. The operating system is responsible for copying data back and forth between the memory and the hard disk, transparently from the application software. (except for the execution time!)
- Optimise disk accesses
It is possible to lazily load files (mmap()), and read the disk only as memory is accessed. For code and data. It is also possible to write back only the modified parts of files.
- Disk caching
Unused memory can be used as a disk cache. When the corresponding program or data is eventually loaded/executed, memory could even be used directly, without any copy between a “cache disk” area and a “live programs” area.
- Extend the memory map
The total physical memory can exceed the memory addressable by each task. For example, with 32bits tasks and a 36bits MMU, each task can separately address 4GB while the total physical memory size is 64GB.
Warning: OS writers hate this “PAE” mechanism, when the kernel cannot directly access all memory using pointers. A true 64bits CPU and OS running 32bits code is far better. On 32bits CPUs, problems arise when RAM exceeds around 2GB.
- Optimise memory allocation
For example, the “Copy On Write” principle enables to copy a memory area by simply mapping the same data in different places (addresses, task contexts), then delaying the actual physical memory copy until it is modified. The Operating System halts the application just before the first modification takes place, allocates memory, copies the block and alters the memory map of the application. For reads, the “Zero fill on demand” principle can be used to lazily zero memory pages as applications accesses them. MMU resouces are also sometimes used for the memory management of garbage collected languages.
- Virtualise resources
Usually, all accesses to peripherals are done through simple memory accesses (this is not the case for x86). The MMU can be used to intercept all IO accesses and help the Operating System or an Emulator simulate arbitrary hardware or even virtualise the whole platform.
All these amazing features are possible with the “Sparc Reference MMU” (Youpi!)
The MMU slices memory in small chunks, called pages, often 4kB, and associate each page to a physical address and to access attributes.
Depending on the access type, the page attributes and the current state of the CPU (read/write, user/supervisor, code/data, task index), the MMU can decide to allow the access, modify page attributes or trigger an exception.
Addresses on the IU side are called “VIRTUAL” (or “LOGICAL”), addresses on the external side are called “PHYSICAL” (or “REAL”).
On 32bits SPARCs, each 4kB page is associated with a 32bits record, called PTE (=Page Table Entry), which describes its characteristics. These records are stored in RAM and are automatically retrieved and updated by the MMU as needed.
(4 bytes PTE for 4kB : At least 1/1000 of the managed memory is directly used by the MMU to store its own structures. Well, I know this evaluation is very naive, but it emphasizes the fact that the MMU cannot store internally all the translation information…)
In most CPU families, the MMU serves the same purpose of translating addresses. There are nevertheless some important differences which are often related to the fact that the MMU must be able to present to each process a different memory map, and to switch quickly between them. The SPARCv8 reference MMU is a simple, straightforward design, some CPU familes have baroque MMUs (like PowerPCs) or plain crazy ones (like x86).
The following posts will dig into the MMU, its implementation and how it lives alongside the cache.