Библиотека сайта rus-linux.net
The book is available and called simply "Understanding The Linux Virtual Memory Manager". There is a lot of additional material in the book that is not available here, including details on later 2.4 kernels, introductions to 2.6, a whole new chapter on the shared memory filesystem, coverage of TLB management, a lot more code commentary, countless other additions and clarifications and a CD with lots of cool stuff on it. This material (although now dated and lacking in comparison to the book) will remain available although I obviously encourge you to buy the book from your favourite book store :-) . As the book is under the Bruce Perens Open Book Series, it will be available 90 days after appearing on the book shelves which means it is not available right now. When it is available, it will be downloadable from http://www.phptr.com/perens so check there for more information.
To be fully clear, this webpage is not the actual book.
Next: 12.2 Mapping Page Table Up: 12. Swap Management Previous: 12. Swap Management   Contents   Index
12.1 Describing the Swap Area
Each active swap area, be it a file or partition, has a struct
swap_info_struct
describing the area. All the structures
in the running system are stored in a statically declared array called
swap_info
which holds MAX_SWAPFILES
, which is
statically defined as 32, entries. This means that at most 32 swap areas can
exist on a running system. The swap_info_struct
is declared
as follows in linux/swap.h
64 struct swap_info_struct { 65 unsigned int flags; 66 kdev_t swap_device; 67 spinlock_t sdev_lock; 68 struct dentry * swap_file; 69 struct vfsmount *swap_vfsmnt; 70 unsigned short * swap_map; 71 unsigned int lowest_bit; 72 unsigned int highest_bit; 73 unsigned int cluster_next; 74 unsigned int cluster_nr; 75 int prio; 76 int pages; 77 unsigned long max; 78 int next; 79 };
Here is a small description of each of the fields in this quite sizable struct.
- flags This is a bit field with two possible
values.
SWP_USED
is set if the swap area is currently active.SWP_WRITEOK
is defined as 3, the two lowest significant bits, including theSWP_USED
bit. The flags is set toSWP_WRITEOK
when Linux is ready to write to the area as it must be active to be written to; - swap_device The device corresponding to the partition used for
this swap area is stored here. If the swap area is a file, this is NULL;
- sdev_lock As with many structures in Linux, this one has to be
protected too.
sdev_lock
is a spinlock protecting the struct, principally theswap_map
. It is locked and unlocked withswap_device_lock()
andswap_device_unlock()
; - swap_file This is the
dentry
for the actual special file that is mounted as a swap area. This could be thedentry
for a file in the/dev/
directory for example in the case a partition is mounted. This field is needed to identify the correctswap_info_struct
when deactiating a swap area; - vfs_mount This is the
vfs_mount
object corresponding to where the device or file for this swap area is stored; - swap_map This is a large array with one entry for every swap entry,
or page sized slot in the area. An entry is a reference count of the number
of users of this page slot. If it is equal to
SWAP_MAP_MAX
, the slot is allocated permanently. If equal toSWAP_MAP_BAD
, the slot will never be used; - lowest_bit This is the lowest possible free slot available in
the swap area and is used to start from when linearly scanning to reduce
the search space. It is known that there are definitely no free slots
below this mark;
- highest_bit This is the highest possible free slot available in this
swap area. Similar to
lowest_bit
, there are definitely no free slots above this mark; - cluster_next This is the offset of the next cluster of blocks to
use. The swap area tries to have pages allocated in cluster blocks to
increase the chance related pages will be stored together;
- cluster_nr This the number of pages left to allocate in this
cluster;
- prio Each swap area has a priority which is stored in this
field. Areas are arranged in order of priority and determine how likely
the area is to be used. By default the priorities are arranged in order
of activation but the system administrator may also specify it using
the
-p
flag when using swapon; - pages As some slots on the swap file may be unusable, this field
stores the number of usable pages in the swap area. This differs from
max
in that slots markedSWAP_MAP_BAD
are not counted; - max This is the total number of slots in this swap area;
- next This is the index in the
swap_info
array of the next swap area in the system.
The areas though stored in an array, are also kept in a pseudo list called
swap_list
which is a very simple type declared as follows in
linux/swap.h
:
154 struct swap_list_t { 155 int head; /* head of priority-ordered swapfile list */ 156 int next; /* swapfile to be used next */ 157 };
The head
is the swap area of the highest priority swap area in use
and the next
is the next swap area that should be used. This is
so areas may be arranged in order of priority when searching for a suitable
area but still looked up quickly in the array when necessary.
Each swap area is divided up into a number of page sized slots on disk which
means that each slot is 4096 bytes on the x86 for example. The first slot is
always reserved as it contains information about the swap area that should
not be overwritten. The first 1 KiB of the swap area is used to store a
disk label for the partition that can be picked up by userspace tools. The
remaining space is used for information about the swap area which is filled
when the swap area is created with the system program mkswap. The
information is used to fill in a union swap_header
which
is declared as follows in linux/swap.h
:
25 union swap_header { 26 struct 27 { 28 char reserved[PAGE_SIZE - 10]; 29 char magic[10]; 30 } magic; 31 struct 32 { 33 char bootbits[1024]; 34 unsigned int version; 35 unsigned int last_page; 36 unsigned int nr_badpages; 37 unsigned int padding[125]; 38 unsigned int badpages[1]; 39 } info; 40 };
A description of each of the fields follows
- magic The
magic
part of the union is used just for identifying the ``magic'' string. The string exists to make sure there is no chance a partition that is not a swap area will be used and to decide what version of swap area is is. If the string is ``SWAP-SPACE'', it is version 1 of the swap file format. If it is ``SWAPSPACE2'', it is version 2. The large reserved array is just so that the magic string will be read from the end of the page; - bootbits This is the reserved area containing information about the
partition such as the disk label;
- version This is the version of the swap area layout;
- last_page This is the last usable page in the area;
- nr_badpages The known number of bad pages that exist in the swap area
are stored in this field;
- padding A disk section is usually about 512 bytes in size. The three
fields
version
,last_page
andnr_badpages
make up 12 bytes and thepadding
fills up the remaining 500 bytes to cover one sector; - badpages The remainder of the page is used to store the indices of up
to
MAX_SWAP_BADPAGES
number of bad page slots. These slots are filled in by the mkswap system program if the-c
switch is specified to check the area.
MAX_SWAP_BADPAGES
is a compile time constant which varies if the
struct changes but it is 637 entries in its current form as given by the simple
equation;
Where 1024 is the size of the bootblock, 512 is the size of the padding and 10 is the size of the magic string identifing the format of the swap file.
Next: 12.2 Mapping Page Table Up: 12. Swap Management Previous: 12. Swap Management   Contents   Index Mel 2004-02-15