Библиотека сайта rus-linux.net
The book is available and called simply "Understanding The Linux Virtual Memory Manager". There is a lot of additional material in the book that is not available here, including details on later 2.4 kernels, introductions to 2.6, a whole new chapter on the shared memory filesystem, coverage of TLB management, a lot more code commentary, countless other additions and clarifications and a CD with lots of cool stuff on it. This material (although now dated and lacking in comparison to the book) will remain available although I obviously encourge you to buy the book from your favourite book store :-) . As the book is under the Bruce Perens Open Book Series, it will be available 90 days after appearing on the book shelves which means it is not available right now. When it is available, it will be downloadable from http://www.phptr.com/perens so check there for more information.
To be fully clear, this webpage is not the actual book.
Next: 9.3 Objects Up: 9. Slab Allocator Previous: 9.1 Caches   Contents   Index
Subsections
- 9.2.1 Storing the Slab Descriptor
- 9.2.2 Slab Creation
- 9.2.3 Tracking Free Objects
- 9.2.4 Initialising the
kmem_bufctl_t
Array - 9.2.5 Finding the Next Free Object
- 9.2.6 Updating kmem_bufctl_t
- 9.2.7 Calculating the Number of Objects on a Slab
- 9.2.8 Slab Destroying
9.2 Slabs
This section will describe how a slab is structured and managed. The struct which describes it is much simpler than the cache descriptor, but how the slab is arranged is considerably more complex. It is declared as follows:
typedef struct slab_s { struct list_head list; unsigned long colouroff; void *s_mem; unsigned int inuse; kmem_bufctl_t free; } slab_t;
The fields in this simple struct are as follows:
- list This is the linked list the slab belongs to. This will be one of
slab_full
,slab_partial
orslab_free
from the cache manager; - colouroff This is the colour offset from the base address of
the first object within the slab. The address of the first
object is
s_mem + colouroff
; - s_mem This gives the starting address of the first object within
the slab;
- inuse This gives the number of active objects in the slab;
- free This is an array of
bufctl
s used for storing locations of free objects. See Section for further details.
The reader will note that given the slab manager or an object
within the slab, there does not appear to be an obvious way to
determine what slab or cache they belong to. This is addressed
by using the list
field in the struct
page
that makes up the cache. SET_PAGE_CACHE()
and
SET_PAGE_SLAB()
use the next
and prev
fields on the page
list
to track what cache and slab
an object belongs to. To get the descriptors from the page, the macros
GET_PAGE_CACHE()
and GET_PAGE_SLAB()
are
available. This set of relationships is illustrated in Figure 9.7
The last issue is where the slab management struct is kept. Slab managers
are kept either on (CFLGS_OFF_SLAB
set in the static flags)
or off-slab. Where they are placed are determined by the size of the object
during cache creation.
9.2.1 Storing the Slab Descriptor
If the objects are larger than a threshold (512 bytes on x86),
CFGS_OFF_SLAB
is set in the cache flags and the slab
descriptor is kept off-slab in one of the sizes cache (see Section 9.4). The selected sizes cache is large enough to contain the
struct slab_t
and kmem_cache_slabmgmt()
allocates
from it as necessary. This limits the number of objects that can be stored
on the slab because there is limited space for the bufctl
s but
that is unimportant as the objects are large and so there should not be many
stored in a single slab.
Alternatively, the slab manger is reserved at the beginning of the slab. When
stored on-slab, enough space is kept at the beginning of the slab to
store both the slab_t
and the kmem_bufctl_t
array. The array is responsible for tracking where the next free object is
stored and is discussed later in the chapter. The objects are stored after
the kmem_bufctl_t
array.
Figure 9.8 should help clarify what a slab with the descriptor on-slab looks like and Figure 9.9 illustrates how a cache uses a sizes cache to store the slab descriptor when the descriptor is kept off-slab.
9.2.2 Slab Creation
At this point, we have seen how the cache is created, but on creation,
it is an empty cache with empty lists for its slab_full
,
slab_partial
and slabs_free
. New slabs are allocated
to a cache by calling the function kmem_cache_grow()
. This
is frequently called ``cache growing'' and occurs when no objects are
left in the slabs_partial
list and there are no slabs in
slabs_free
. The tasks it fulfills are
- Perform basic sanity checks to guard against bad usage;
- Calculate colour offset for objects in this slab;
- Allocate memory for slab and acquire a slab descriptor;
- Link the pages used for the slab to the slab and cache descriptors described in Section 9.2;
- Initialise objects in the slab;
- Add the slab to the cache.
9.2.3 Tracking Free Objects
The slab allocator has got to have a quick and simple means of tracking where
free objects are on the partially filled slabs. It achieves this by using a
kmem_bufctl_t
array that is associated with each slab manager
as obviously it is up to the slab manager to know where its free objects are.
Historically, and according to the paper describing the slab
allocator [#!bonwick94!#], kmem_bufctl_t
was a linked list of
objects. In Linux 2.2.x, this struct was a union of three items, a pointer
to the next free object, a pointer to the slab manager and a pointer to the
object. Which it was depended on the state of the object.
Today, the slab and cache an object belongs to is determined by the
struct page
and kmem_bufctl_t
is simply an integer
array of object indices. The number of elements in the array is the same as
the number of objects on the slab.
141 typedef unsigned int kmem_bufctl_t;
As the array is kept after the slab descriptor and there is no pointer to
the first element directly, a helper macro slab_bufctl()
is provided.
163 #define slab_bufctl(slabp) \ 164 ((kmem_bufctl_t *)(((slab_t*)slabp)+1))
This seemingly cryptic macro is quite simple when broken down. The
parameter slabp
is a pointer to the slab manager. The block
((slab_t*)slabp)+1
casts slabp
to a slab_t
struct and adds 1 to it. This will give a slab_t
*
pointer to the beginning of the kmem_bufctl_t
array. (kmem_bufctl_t *)
recasts that pointer back
to the required type. The results in blocks of code that contain
slab_bufctl(slabp)[i]
. Translated, that says ``take a pointer
to a slab descriptor, offset it with slab_bufctl()
to the beginning
of the kmem_bufctl_t
array and return the i element of
the array''.
The index to the next free object in the slab is stored in
slab_t
free
eliminating the need for a linked list
to track free objects. When objects are allocated or freed, this pointer
is updated based on information in the kmem_bufctl_t
array.
9.2.4 Initialising the kmem_bufctl_t
Array
When a cache is grown, all the objects and the kmem_bufctl_t
array on the slab are initialised. The array is filled with the index of each
object beginning with 1 and ending with the marker BUFCTL_END
. For a
slab with 5 objects, the elements of the array would look like Figure 9.11.
The value 0 is stored in slab_t
free
as the 0
object is the first free object to be used. The idea is that for a given
object n, the index of the next free object will be stored in
kmem_bufctl_t[n]
. Looking at the array above, the next object
free after 0 is 1. After 1, there are two and so on. As the array is used,
this arrangement will make the array act as a LIFO for free objects.
9.2.5 Finding the Next Free Object
kmem_cache_alloc()
will call
kmem_cache_alloc_one_tail()
when allocating an object to perform
the ``real'' work of updating the kmem_bufctl_t()
array.
slab_t
free
has the index of the
first free object. The index of the next free object is at
kmem_bufctl_t[slab_t
free]
.
In code terms, this looks like
1253 objp = slabp->s_mem + slabp->free*cachep->objsize; 1254 slabp->free=slab_bufctl(slabp)[slabp->free];
slabp
s_mem
is the index of the first object on the
slab. slabp
free
is the index of the object to allocate
and it has to be multiplied by the size of an object.
The index of the next free object is stored at
kmem_bufctl_t[
. There is no
pointer directly to the array hence the helper macro slabp
free
]slab_bufctl()
is used. Note that the kmem_bufctl_t
array is not changed
during allocations but that the elements that are unallocated are
unreachable. For example, after two allocations, index 0 and 1 of the
kmem_bufctl_t
array are not pointed to by any other element.
9.2.6 Updating kmem_bufctl_t
The kmem_bufctl_t
list is only updated when an object is freed
in the function kmem_cache_free_one()
. The array is updated with
this block of code:
1451 unsigned int objnr = (objp-slabp->s_mem)/cachep->objsize; 1452 1453 slab_bufctl(slabp)[objnr] = slabp->free; 1454 slabp->free = objnr;
objp
is the object about to be freed and objnr
is its
index. kmem_bufctl_t[objnr]
is updated to point to the current
value of slabp
free
, effectively placing the object pointed
to by free
on the pseudo linked list. slabp
free
is updated to the object being freed so that it will be the next one allocated.
9.2.7 Calculating the Number of Objects on a Slab
During cache creation, the function kmem_cache_estimate()
is
called to estimate how many objects may be stored on a single slab taking
into account whether the slab descriptor must be stored on-slab or off-slab
and the size of each kmem_bufctl_t
needed to track if an
object is free or not. It returns the number of objects that may be stored
and how many bytes are wasted. The number of wasted bytes is important if
cache colouring is to be used.
The calculation is quite basic and takes the following steps
- Initialise
wastage
to be the total size of the slab, ; - Subtract the amount of space required to store the slab descriptor;
- Count up the number of objects that may be stored. Include the
size of the
kmem_bufctl_t
if the slab descriptor is stored on the slab. Keep increasing the size of i until the slab is filled; - Return the number of objects and bytes wasted.
9.2.8 Slab Destroying
When a cache is being shrunk or destroyed, the slabs will be deleted. As the objects may have destructors, these must be called, so the tasks of this function are:
- If available, call the destructor for every object in the slab;
- If debugging is enabled, check the red marking and poison pattern;
- Free the pages the slab uses.
The call graph at Figure 9.12 is very simple.
Next: 9.3 Objects Up: 9. Slab Allocator Previous: 9.1 Caches   Contents   Index Mel 2004-02-15