If it's there, and you can see it, it's real.
If it's not there, and you can see it, it's virtual.
If it's there, and you can't see it, it's transparent.
If it's not there, and you can't see it, you erased it.
-IBM poster explaining virtual memory, circa 1978
During sortAlso see vmstat, top, free -m
Mem: 3369572k total, 3242668k used, 126904k free, 128664k buffers
Swap: 2040244k total, 102636k used, 1937608k free, 2921304k cached
Sort cancelled
Mem: 3369572k total, 475692k used, 2893880k free, 121604k buffers
Swap: 2040244k total, 102636k used, 1937608k free, 215344k cached
The paging system can be divided into two sections. First, the policy algorithm decides which pages to write out to disk and when to write then. Second, the paging mechanism carries out the transfer and pages data back into physical memory when they are needed again.
Linux uses a Least Recently Used (LRU) page aging technique to fairly choose pages which might be removed from the system. This scheme involves every page in the system having an age which changes as the page is accessed. The more that a page is accessed, the younger it is; the less that it is accessed, the older and more stale it becomes. Old pages are good candidates for swapping.
Moving unused parts out to disk used to be done manually by the programmer, back in the early days of computing. Programmers had to expend vast amounts of effort keeping track of what was in memory at a given time, and rolling segments in and out as needed. Older languages like COBOL still contain a large vocabulary of features for expressing this memory overlaying - totally obsolete and inexplicable to the current generation of programmers.
As the processor executes a program it reads an instruction from memory and decodes it. In decoding the instruction, the processor may need to fetch or store the contents of a location in memory. The processor then executes the instruction and moves on to the next instruction in the program. In this way the processor is always accessing memory either to fetch instructions or to fetch and store data.
In a virtual memory system all of these addresses are virtual addresses and not physical addresses. These virtual addresses are converted into physical addresses by the processor based on information held in a set of tables maintained by the operating system.
Address translation takes place using a page table and is fast due to dedicated hardware. Each process has its own page table that maps pages of its virtual address space to frames in physical memory. When a process references a particular virtual address, the appropriate entry in its page table is inspected to determine in which physical frame the page resides. When a process references a virtual address not yet in a frame, a page fault occurs and a frame is allocated in physical memory.
A Windows user spends 1/3 of his life sleeping, 1/3 working, 1/3 waiting.
The memory manager is a part of the operating system that keeps track of associations between virtual addresses and physical addresses and handles paging. To the memory manager, the page is the basic unit of memory. The Memory Management Unit (MMU), which is a hardware agent, performs the actual translations.
Very broadly, memory management under Linux has two components. The first deals with allocating and freeing physical memory. The second handles virtual memory. This is an overview of the main subsystems.
When page frames are allocated and deallocated, the system runs into a memory fragmentation problem called external fragmentation. This occurs when the available page frames are spread out throughout memory in such a way that large amounts of contiguous page frames are not available for allocation although the total number of available page frames is sufficient. That is, the available page frames are interrupted by one or more unavailable page frames, which breaks continuity. There are various approaches to reduce external fragmentation. Linux uses an implementation of a memory management algorithm called the buddy system.
Buddy systems maintain a list of available blocks of memory. Each list will point to blocks of memory of different sizes, but they are all sized in powers of two. The number of lists depends on the implementation. Page frames are allocated from the list of free blocks of the smallest possible size. This maintains larger contiguous block sizes available for the larger requests. When allocated blocks are returned, the buddy system searches the free lists for available blocks of memory that's the same size as the returned block. If any of these available blocks is contiguous to the returned block, they are merged into a block twice the size of each individual. These blocks (the returned block and the available block that is contiguous to it) are called buddies, hence the name "buddy system." This way , the kernel ensures that larger block sizes become available as soon as page frames are freed.
Slab allocator: manages the allocation of memory sizes smaller than a page.
Pages are the basic unit of memory for the memory manager. However, processes generally request memory on the order of bytes, not on the order of pages. To support the allocation of smaller memory requests made through calls to functions like kmalloc(), the kernel implements the slab allocator, which is a layer of the memory manager that acts on acquired pages.
The slab allocator seeks to reduce the cost incurred by allocating, initializing, destroying, and freeing memory areas by maintaining a ready cache of commonly used memory areas. This cache maintains the memory areas allocated, initialized, and ready to deploy. When the requesting process no longer needs the memory areas, they are simply returned to the cache.
If we run the command cat /proc/slabinfo, the existing slab allocator caches are listed.
In the logical view the address space consists of a set of nonoverlapping regions, each region representing a continous, page-aligned subset of the address space. Each region is described internally by a single vm_area_struct structure that defines the properties of the region, including the process's read, write, and execute permissions in the region, and information about any files.
The physical view is stored in the hardware page tables for the process. The page-table entries determine the exact current location of each page of virtual memory.
Several types of virtual memory regions, defined by the backing store for the region. Where did the pages for a region come from? A region backed by nothing represents demand-zero memory. When a process tries to read a page in such a region, it is simply given back a page of memory filled with zeros. Alternatively a region can be backed by a file. Any number of processes can map the same region of the same file, and they will all end up using the same page of physical memory for the purpose.
A region may also be defined by its reaction to writes. The mapping of a region into the process's address space can be either private or shared. If a process writes to a privately mapped region, then the pager detects that a copy-on-write is necessary to keep the changes local to the process. In contrast, writes to a shared region result in updating of the object mapped into that region, so that the change will be visible immediately to any other process that is mapping that object.
The exec() system call tells the kernel to run a new program within the current process, completely overwriting the current execution context with the initial context of the new program. The process is given a new, completely empty virtual address space. Routines loading the program will populate the address space with virtual memory regions. Creating a new process with fork() involves creating a complete copy of the existing process's vm_area_struct descriptors, then creates a new set of page tables for the child.
Both the child and the parent continue executing with the instruction that follows the call to fork. The child is a copy of the parent. For example, the child gets a copy of the parent's data space, heap, and stack. Note that this is a copy for the child; the parent and the child do not share these portions of memory. The parent and the child share the text segment.
Current implementations don't perform a complete copy of the parent's data, stack, and heap, since a fork is often followed by an exec. Instead, a technique called copy-on-write (COW) is used. These regions are shared by the parent and the child and have their protection changed by the kernel to read-only. If either process tries to modify these regions, the kernel then makes a copy of that piece of memory only, typically a "page" in a virtual memory system.
You take the blue pill, the story ends. You wake up and whatever you want to believe. You take the red.... you stay in and I show you just how deep the rabbit hole goes....Linux MM Wiki
-Morpheus in the Matrix