Библиотека сайта rus-linux.net
The book is available and called simply "Understanding The Linux Virtual Memory Manager". There is a lot of additional material in the book that is not available here, including details on later 2.4 kernels, introductions to 2.6, a whole new chapter on the shared memory filesystem, coverage of TLB management, a lot more code commentary, countless other additions and clarifications and a CD with lots of cool stuff on it. This material (although now dated and lacking in comparison to the book) will remain available although I obviously encourge you to buy the book from your favourite book store :-) . As the book is under the Bruce Perens Open Book Series, it will be available 90 days after appearing on the book shelves which means it is not available right now. When it is available, it will be downloadable from http://www.phptr.com/perens so check there for more information.
To be fully clear, this webpage is not the actual book.
Next: 3.2 Zones Up: 3. Describing Physical Memory Previous: 3. Describing Physical Memory   Contents   Index
3.1 Nodes
As we have mentioned, each node in memory is described by a
pg_data_t
struct. When allocating a page, Linux uses a
node-local allocation policy to allocate memory from the node
closest to the running CPU. As processes tend to run on the same CPU, it is
likely the memory from the current node will be used. The struct is declared
as follows in linux/mmzone.h
:
129 typedef struct pglist_data { 130 zone_t node_zones[MAX_NR_ZONES]; 131 zonelist_t node_zonelists[GFP_ZONEMASK+1]; 132 int nr_zones; 133 struct page *node_mem_map; 134 unsigned long *valid_addr_bitmap; 135 struct bootmem_data *bdata; 136 unsigned long node_start_paddr; 137 unsigned long node_start_mapnr; 138 unsigned long node_size; 139 int node_id; 140 struct pglist_data *node_next; 141 } pg_data_t;
We now briefly describe each of these fields:
- node_zones The zones for this node,
ZONE_ HIGHMEM
,ZONE_ NORMAL
,ZONE_ DMA
; - node_zonelists This is the order of zones that allocations are
preferred from.
build_zonelists()
inpage_alloc.c
sets up the order when called byfree_area_init_core()
. A failed allocation inZONE_ HIGHMEM
may fall back toZONE_ NORMAL
or back toZONE_ DMA
; - nr_zones Number of zones in this node, between 1 and 3. Not all
nodes will have three. A CPU bank may not have
ZONE_ DMA
for example; - node_mem_map This is the first page of the
struct page
array representing each physical frame in the node. It will be placed somewhere within the globalmem_map
array; - valid_addr_bitmap A bitmap which describes ``holes'' in the memory
node that no memory exists for;
- bdata This is only of interest to the boot memory allocator
discussed in Chapter 6;
- node_start_paddr The starting physical address of the node. An
unsigned long does not work optimally as it breaks for
ia323.1 with Physical Address Extension (PAE)
3.2 for example. A more suitable solution would be to record
this as a Page Frame Number (PFN) which could
be trivially defined as (
page_phys_addr
>>PAGE_SHIFT
); - node_start_mapnr This gives the page offset within the global
mem_map
. It is calculated infree_area_init_core()
by calculating the number of pages betweenmem_map
and the localmem_map
for this node calledlmem_map
; - node_size The total number of pages in this zone;
- node_id The ID of the node, starts at 0;
- node_next Pointer to next node in a NULL terminated list.
All nodes in the system are maintained on a list called
pgdat_list
. The nodes are placed on this list as they are
initialised by the init_bootmem_core()
function, described later
in Section 6.2.2. Up until late 2.4 kernels
(> 2.4.18), blocks of code that traversed the list looked something like:
pg_data_t * pgdat; pgdat = pgdat_list; do { /* do something with pgdata_t */ ... } while ((pgdat = pgdat->node_next));
In more recent kernels, a macro for_each_pgdat()
, which is
trivially defined as a for loop, is provided to improve code readability.
Footnotes
- ... ia323.1
- FYI from Jeff Haran: Some PowerPC variants appear to have this same problem (e.g. PPC440GP).
- ... (PAE)3.2
- PAE is discussed further in Section 3.4.
Next: 3.2 Zones Up: 3. Describing Physical Memory Previous: 3. Describing Physical Memory   Contents   Index Mel 2004-02-15