Наши друзья и партнеры


Книги по Linux (с отзывами читателей)

Библиотека сайта rus-linux.net

After this documentation was released in July 2003, I was approached by Prentice Hall and asked to write a book on the Linux VM under the Bruce Peren's Open Book Series.

The book is available and called simply "Understanding The Linux Virtual Memory Manager". There is a lot of additional material in the book that is not available here, including details on later 2.4 kernels, introductions to 2.6, a whole new chapter on the shared memory filesystem, coverage of TLB management, a lot more code commentary, countless other additions and clarifications and a CD with lots of cool stuff on it. This material (although now dated and lacking in comparison to the book) will remain available although I obviously encourge you to buy the book from your favourite book store :-) . As the book is under the Bruce Perens Open Book Series, it will be available 90 days after appearing on the book shelves which means it is not available right now. When it is available, it will be downloadable from http://www.phptr.com/perens so check there for more information.

To be fully clear, this webpage is not the actual book.
next up previous contents index
Next: 4.4 Translating and Setting Up: 4. Page Table Management Previous: 4.2 Describing a Page   Contents   Index

4.3 Using Page Table Entries

Macros are defined in asm/pgtable.h which are important for the navigation and examination of page table entries. To navigate the page directories, three macros are provided which break up a linear address space into its component parts. pgd_offset() takes an address and the mm_struct for the process and returns the PGD entry that covers the requested address. pmd_offset() takes a PGD entry and an address and returns the relevant PMD. pte_offset() takes a PMD and returns the relevant PTE. The remainder of the linear address provided is the offset within the page. The relationship between these fields is illustrated in Figure 4.3

Figure 4.3: Page Table Layout

The second round of macros determine if the page table entries are present or may be used.

  • pte_none(), pmd_none() and pgd_none() return 1 if the corresponding entry does not exist;

  • pte_present(), pmd_present() and pgd_present() return 1 if the corresponding page table entries have the PRESENT bit set;

  • pte_clear(), pmd_clear() and pgd_clear() will clear the corresponding page table entry;

  • pmd_bad() and pgd_bad() are used to check entries when passed as input parameters to functions that may change the value of the entries. Whether it returns 1 varies between the few architectures that define these macros but for those that actually define it, making sure the page entry is marked as present and accessed are the two most important checks.

There are many parts of the VM which are littered with page table walk code and it is important to recognise it. A very simple example of a page table walk is the function follow_page() in mm/memory.c. The following is an excerpt from that function, the parts unrelated to the page table walk are omitted:

407         pgd_t *pgd;
408         pmd_t *pmd;
409         pte_t *ptep, pte;
411         pgd = pgd_offset(mm, address);
412         if (pgd_none(*pgd) || pgd_bad(*pgd))
413                 goto out;
415         pmd = pmd_offset(pgd, address);
416         if (pmd_none(*pmd) || pmd_bad(*pmd))
417                 goto out;
419         ptep = pte_offset(pmd, address);
420         if (!ptep)
421                 goto out;
423         pte = *ptep;

It simply uses the three offset macros to navigate the page tables and the _none() and _bad() macros to make sure it is looking at a valid page table.

The third set of macros examine and set the permissions of an entry. The permissions determine what a userspace process can and cannot do with a particular page. For example, the kernel page table entries are never readable by a userspace process.

  • The read permissions for an entry are tested with pte_read(), set with pte_mkread() and cleared with pte_rdprotect();

  • The write permissions are tested with pte_write(), set with pte_mkwrite() and cleared with pte_wrprotect();

  • The execute permissions are tested with pte_exec(), set with pte_mkexec() and cleared with pte_exprotect(). It is worth nothing that with the x86 architecture, there is no means of setting execute permissions on pages so these three macros act the same way as the read macros;

  • The permissions can be modified to a new value with pte_modify() but its use is almost non-existent. It is only used in the function change_pte_range() in mm/mprotect.c.

The fourth set of macros examine and set the state of an entry. There are only two bits that are important in Linux, the dirty bit and the accessed bit. To check these bits, the macros pte_dirty() and pte_young() macros are used. To set the bits, the macros pte_mkdirty() and pte_mkyoung() are used. To clear them, the macros pte_mkclean() and pte_old() are available.

next up previous contents index
Next: 4.4 Translating and Setting Up: 4. Page Table Management Previous: 4.2 Describing a Page   Contents   Index
Mel 2004-02-15