Библиотека сайта rus-linux.net
The book is available and called simply "Understanding The Linux Virtual Memory Manager". There is a lot of additional material in the book that is not available here, including details on later 2.4 kernels, introductions to 2.6, a whole new chapter on the shared memory filesystem, coverage of TLB management, a lot more code commentary, countless other additions and clarifications and a CD with lots of cool stuff on it. This material (although now dated and lacking in comparison to the book) will remain available although I obviously encourge you to buy the book from your favourite book store :-) . As the book is under the Bruce Perens Open Book Series, it will be available 90 days after appearing on the book shelves which means it is not available right now. When it is available, it will be downloadable from http://www.phptr.com/perens so check there for more information.
To be fully clear, this webpage is not the actual book.
Next: 11.4 Shrinking all caches Up: 11. Page Frame Reclamation Previous: 11.2 Page Cache Contents Index
Subsections
- 11.3.1 Adding Pages to the Page Cache
- 11.3.2 Refilling
inactive_list
- 11.3.3 Reclaiming Pages from the Page Cache
11.3 Manipulating the Page Cache
This section begins with how pages are added to the page cache. It will
then cover how pages are moved from the active_list
to the
inactive_list
. Lastly we will cover how pages are reclaimed from
the page cache.
11.3.1 Adding Pages to the Page Cache
Pages which are read from a file or block device are added to the
page cache by calling __add_to_page_cache()
during
generic_file_read()
.
All filesystems use the high level function generic_file_read()
so that operations will take place through the page cache. It calls
do_generic_file_read()
which first checks if the page exists in
the page cache. If it does not, the information is read from disk and added
to the cache with __add_to_page_cache()
.
Anonymous pages are added to the page cache the first time they are about
to be swapped out and will be discussed further in Section 12.4. The only real difference between anonymous pages and file backed
pages as far as the page cache is concerned is that anonymous pages will
use swapper_space
as the struct address_space
.
Shared memory pages are added during one of two cases. The first is during
shmem_getpage_locked()
which is called when a page has to be
either fetched from swap or allocated as it is the first reference. The
second is when the swapout code calls shmem_unuse()
. This occurs
when a swap area is being deactivated and a page, backed by swap space, is
found that does not appear to belong to any process. The inodes related to
shared memory are exhaustively searched until the correct page is found. In
both cases, the page is added with add_to_page_cache()
.
11.3.2 Refilling inactive_list
When caches are being shrunk, pages are moved from the
active_list
to the inactive_list
by the function
refill_inactive()
. It takes as a parameter the number of pages
to move, which is calculated in shrink_caches()
as a ratio depending
on nr_pages
, the number of pages in active_list
and
the number of pages in inactive_list
. The number of pages to move
is calculated as
This keeps the active_list
about two thirds the size of the
inactive_list
and the number of pages to move is determined as
a ratio based on how many pages we desire to swap out (nr_pages
).
Pages are taken from the end of the active_list
. If the
PG_referenced
flag is set, it is cleared and the page is put back
at top of the active_list
as it has been recently used and is still
``hot''. If the flag is cleared, it is moved to the inactive_list
and the PG_referenced
flag set so that it will be quickly
promoted to the active_list
if necessary.
11.3.3 Reclaiming Pages from the Page Cache
The function shrink_cache()
is the part of the replacement
algorithm which takes pages from the inactive_list
and decides how they should be swapped out. The two starting
parameters which determine how much work will be performed are
nr_pages
and priority
. nr_pages
starts out as SWAP_CLUSTER_MAX
and priority starts as
DEF_PRIORITY
.
Two parameters, max_scan
and max_mapped
determine how much work the function will do and are affected by the
priority
. Each time the function shrink_caches()
is
called without enough pages being freed, the priority will be decreased
until the highest priority 1 is reached.
max_scan
is the maximum number of pages will be scanned by this
function and is simply calculated as
where nr_inactive_pages
is the number of pages in the
inactive_list
. This means that at lowest priority 6, at most one
sixth of the pages in the inactive_list
will be scanned and at
highest priority, all of them will be.
The second parameter is max_mapped
which determines how many
process pages are allowed to exist in the page cache before whole processes
will be swapped out. This is calculated as the minimum of either one tenth
of max_scan
or
In other words, at lowest priority, the maximum number of mapped pages allowed
is either one tenth of max_scan
or 16 times the number of pages
to swap out (nr_pages
) whichever is the lower number. At high
priority, it is either one tenth of max_scan
or 512 times the
number of pages to swap out.
From there, the function is basically a very large for-loop which scans at
most max_scan
pages to free up nr_pages
pages from the
end of the inactive_list
or until the inactive_list
is empty. After each page, it checks to see whether it should reschedule
itself so that the swapper does not monopolise the CPU.
For each type of page found on the list, it makes a different decision on what to do. The page types and actions are as follows:
Page is mapped by a process. The max_mapped
count is
decremented. If it reaches 0, the page tables of processes will be linearly
searched and swapped out by the function swap_out()
Page is locked and the PG_launder bit is set. A reference to the
page is taken with page_cache_get()
so that the page will not
disappear and wait_on_page()
is called which sleeps until the IO
is complete. Once it is completed, the reference count is decremented with
page_cache_release()
. When the count reaches zero, it is freed.
Page is dirty, is unmapped by all processes, has no buffers and belongs
to a device or file mapping. The PG_dirty
bit is cleared and
the PG_launder
bit is set. A reference to the page is taken with
page_cache_get()
so the page will not disappear prematurely and
then the writepage()
function provided by the mapping is
called to clean the page. The last case will pick up this page during the
next pass and wait for the IO to complete if necessary.
Page has buffers associated with data on disk. A reference
is taken to the page and an attempt is made to free the pages with
try_to_release_page()
. If it succeeds and is an anonymous page,
the page can be freed. If it is backed by a file or device, the reference
is simply dropped and the page will be freed later. However it is unclear
how a page could have both associated buffers and a file mapping.
Page is anonymous belonging to a process and has no associated
buffers. The LRU is unlocked and the page is unlocked. The
max_mapped
count is decremented. If it reaches zero, then
swap_out()
is called to start swapping out entire processes as
there are too many process mapped pages in the page cache. An anonymous page
may have associated buffers if it is backed by a swap file. In this case, the
page is treated as a buffer page and normal block IO syncs the page with the
backing storage.
Page has no references to it. If the page is in the swap cache, it is deleted from it as it is now stored in the swap area. If it is part of a file, it is removed from the inode queue. The page is then deleted from the page cache and freed.
Next: 11.4 Shrinking all caches Up: 11. Page Frame Reclamation Previous: 11.2 Page Cache Contents Index Mel 2004-02-15