Библиотека сайта rus-linux.net
The book is available and called simply "Understanding The Linux Virtual Memory Manager". There is a lot of additional material in the book that is not available here, including details on later 2.4 kernels, introductions to 2.6, a whole new chapter on the shared memory filesystem, coverage of TLB management, a lot more code commentary, countless other additions and clarifications and a CD with lots of cool stuff on it. This material (although now dated and lacking in comparison to the book) will remain available although I obviously encourge you to buy the book from your favourite book store :-) . As the book is under the Bruce Perens Open Book Series, it will be available 90 days after appearing on the book shelves which means it is not available right now. When it is available, it will be downloadable from http://www.phptr.com/perens so check there for more information.
To be fully clear, this webpage is not the actual book.
Next: 12.3 Allocating a swap Up: 12. Swap Management Previous: 12.1 Describing the Swap   Contents   Index
12.2 Mapping Page Table Entries to Swap Entries
When a page is swapped out, Linux uses the corresponding PTE to store enough
information to locate the page on disk again. Obviously a PTE is not large
enough in itself to store precisely where on disk the page is located, but it
is more than enough to store an index into the swap_info
array
and an offset within the swap_map
and this is precisely what
Linux does.
Each PTE, regardless of architecture, is large enough to store
a swp_entry_t
which is declared as follows in
linux/shmem_fs.h
16 typedef struct { 17 unsigned long val; 18 } swp_entry_t;
Two macros are provided for the translation of PTEs to swap entries
and vice versa. They are pte_to_swp_entry()
and
swp_entry_to_pte()
respectively.
In the swp_entry_t
, two bits are always kept free which are
used by Linux to determine if a PTE is present or swapped out. Bit 0 is
reserved for the _PAGE_PRESENT
flag and Bit 7 is reserved for
_PAGE_PROTNONE
. The requirement for both bits is explained
in Section 4.2.
Bits 1-6 are for the type which is the index within the
swap_info
array and are returned by the SWP_TYPE()
macro.
Bits 8-31 are used are to store the offset within the
swap_map
from the swp_entry_t
. On the x86, this
means 24 bits are available, ``limiting'' the size of the swap area to 64GiB.
The macro SWP_OFFSET()
is used to extract the offset.
To encode a type and offset into a swp_entry_t
, the macro
SWP_ENTRY()
is available which simply performs the relevant bit
shifting operations. The relationship between all these macros is illustrated
in Figure 12.1.
It should be noted that the six bits for ``type'' should allow up
to 64 swap areas to exist in a 32 bit architecture instead of the
MAX_SWAPFILES
restriction of 32. The restriction is probably
due to the consumption of the vmalloc address space. If a swap area is the
maximum possible size then 32MiB is required for the swap_map
(
); remember that each page uses one
short for the reference count. For just MAX_SWAPFILES
maximum
number of swap areas to exist, 1GiB of virtual malloc space is required which
is simply impossible because of the user/kernel linear address space split.
This would imply supporting 64 swap areas is not worth the additional complexity but there is cases where a large number of swap areas would be desirable even if the overall swap available does not increase. Some modern machines12.2 have many separate disks which between them can create a large number of separate block devices. In this case, it is desirable to create a large number of small swap areas which are evenly distributed across all disks. This would allow a high degree of parallelism in the page swapping behavior which is important for swap intensive applications.
Footnotes
- ... machines12.2
- A Sun E450 could have in the region of 20 disks in it for example.
Next: 12.3 Allocating a swap Up: 12. Swap Management Previous: 12.1 Describing the Swap   Contents   Index Mel 2004-02-15