Библиотека сайта rus-linux.net
The book is available and called simply "Understanding The Linux Virtual Memory Manager". There is a lot of additional material in the book that is not available here, including details on later 2.4 kernels, introductions to 2.6, a whole new chapter on the shared memory filesystem, coverage of TLB management, a lot more code commentary, countless other additions and clarifications and a CD with lots of cool stuff on it. This material (although now dated and lacking in comparison to the book) will remain available although I obviously encourge you to buy the book from your favourite book store :-) . As the book is under the Bruce Perens Open Book Series, it will be available 90 days after appearing on the book shelves which means it is not available right now. When it is available, it will be downloadable from http://www.phptr.com/perens so check there for more information.
To be fully clear, this webpage is not the actual book.
Next: 4.4 Translating and Setting Up: 4. Page Table Management Previous: 4.2 Describing a Page   Contents   Index
4.3 Using Page Table Entries
Macros are defined in asm/pgtable.h
which are important for the
navigation and examination of page table entries. To navigate the page
directories, three macros are provided which break up a linear address
space into its component parts. pgd_offset()
takes an address and
the mm_struct
for the process and returns the PGD entry that
covers the requested address. pmd_offset()
takes a PGD entry and
an address and returns the relevant PMD. pte_offset()
takes a PMD
and returns the relevant PTE. The remainder of the linear address provided
is the offset within the page. The relationship between these fields is
illustrated in Figure 4.3
The second round of macros determine if the page table entries are present or may be used.
pte_none()
,pmd_none()
andpgd_none()
return 1 if the corresponding entry does not exist;pte_present()
,pmd_present()
andpgd_present()
return 1 if the corresponding page table entries have the PRESENT bit set;pte_clear()
,pmd_clear()
andpgd_clear()
will clear the corresponding page table entry;pmd_bad()
andpgd_bad()
are used to check entries when passed as input parameters to functions that may change the value of the entries. Whether it returns 1 varies between the few architectures that define these macros but for those that actually define it, making sure the page entry is marked as present and accessed are the two most important checks.
There are many parts of the VM which are littered with page table walk code and
it is important to recognise it. A very simple example of a page table walk is
the function follow_page()
in mm/memory.c
. The following
is an excerpt from that function, the parts unrelated to the page table walk
are omitted:
407 pgd_t *pgd; 408 pmd_t *pmd; 409 pte_t *ptep, pte; 410 411 pgd = pgd_offset(mm, address); 412 if (pgd_none(*pgd) || pgd_bad(*pgd)) 413 goto out; 414 415 pmd = pmd_offset(pgd, address); 416 if (pmd_none(*pmd) || pmd_bad(*pmd)) 417 goto out; 418 419 ptep = pte_offset(pmd, address); 420 if (!ptep) 421 goto out; 422 423 pte = *ptep;
It simply uses the three offset macros to navigate the page tables and the
_none()
and _bad()
macros to make sure it is looking at
a valid page table.
The third set of macros examine and set the permissions of an entry. The permissions determine what a userspace process can and cannot do with a particular page. For example, the kernel page table entries are never readable by a userspace process.
- The read permissions for an entry are tested with
pte_read()
, set withpte_mkread()
and cleared withpte_rdprotect()
; - The write permissions are tested with
pte_write()
, set withpte_mkwrite()
and cleared withpte_wrprotect()
; - The execute permissions are tested with
pte_exec()
, set withpte_mkexec()
and cleared withpte_exprotect()
. It is worth nothing that with the x86 architecture, there is no means of setting execute permissions on pages so these three macros act the same way as the read macros; - The permissions can be modified to a new value with
pte_modify()
but its use is almost non-existent. It is only used in the functionchange_pte_range()
inmm/mprotect.c
.
The fourth set of macros examine and set the state of an entry. There
are only two bits that are important in Linux, the dirty bit and the
accessed bit. To check these bits, the macros pte_dirty()
and pte_young()
macros are used. To set the bits, the macros
pte_mkdirty()
and pte_mkyoung()
are used. To
clear them, the macros pte_mkclean()
and pte_old()
are available.
Next: 4.4 Translating and Setting Up: 4. Page Table Management Previous: 4.2 Describing a Page   Contents   Index Mel 2004-02-15