Overview of Linux Memory Management Concepts: Slabs
Jim Blakey
Note: In the Linux 2.6 kernels, a new cache manager called the slub allocator is available and may replace the slab allocator described here. Either can be used through kernel configuration parameters. This paper was written a while ago and does not contain the most current kernel information. I will write an update soon - jimb
Introduction to slabs and kernel cache
Kernel modules and drivers often need to allocate temporary storage for non-persistent structures and objects, such as inodes, task structures, and device structures. These objects are uniform in size and are allocated and released many times during the life of the kernel. In earlier Unix and Linux implementations, the usual mechanisms for creating and releasing these objects were the kmalloc() and kfree() kernel calls.
However, these used an allocation scheme that was optimized for allocating and releasing pages in multiples of the hardware page size. For the small transient objects often required by the kernel and drivers, these page allocation routines were horribly inefficient, leaving the individual kernel modules and drivers responsible for optimizing their own memory usage.
One solution was to create a global kernel caching allocator which manages individual caches of identical objects on behalf of kernel modules and drivers. Each module or driver can ask the kernel cache allocator to create a private cache of a specific object type. The cache allocator handles growing each cache as needed on behalf of the module, and more importantly, the cache allocator can release unused pages back to the free pool in times when memory is in a crunch. The cache allocator works with the rest of the memory system to maintain a balance between the memory needs of each driver or module and the system as a whole.
The Linux 2.4 kernel implements a caching memory allocator to hold caches (called slabs) of identical objects. This slab allocator is basically an implementation of the "Slab Allocator" as described in UNIX Internals: The New Frontiers by Uresh Vahalia, Prentice Hall, ISBN 0-13-101908-2, with further tweaks based on a The Slab Allocator: An Object-Caching Kernel Memory Allocator, Jeff Bonwick (Sun Microsystems). presented at: USENIX Summer 1994 Technical Conference.
The following is a partial list of caches maintained by the Slab Allocator on a typical Linux 2.4 system. This information comes from the /proc/slabinfo file. Note that most kernel modules have requested their own caches. The columns are cache name, active objects, total number of objects, object size, number of full or partial pages, total allocated pages, and pages per slab.
|
Figure 1: Typical list of caches maintained by kernel
Slab Overview
A slab is a set of one or more contiguous pages of memory set aside by the slab allocator for an individual cache. This memory is further divided into equal segments the size of the object type that the cache is managing.

Figure 2: Two page slab with 6 objects
As an example, assume a file-system driver wishes to create a cache of inodes that it can pull from. Through the kmem_cache_create() call, the slab allocator will calculate the optimal number of pages (in powers of 2) required for each slab given the inode size and other parameters. A kmem_cache_t pointer to this new inode cache is returned to the file-system driver.
When the file-system driver needs a new inode, it calls kmem_cache_alloc() with the kmem_cache_t pointer. The slab allocator will attempt to find a free inode object within the slabs currently allocated to that cache. If there are no free objects, or no slabs, then the slab allocator will grow the cache by fetching a new slab from the free page memory and returning an inode object from that.
When the file-system driver is finished with the inode object, it calls kmem_cache_free() to release the inode. The slab allocator will then mark that object within the slab as free and available.
If all objects within a slab are free, the pages that make up the slab are available to be returned to the free page pool if memory becomes tight. If more inodes are required at a later time, the slab allocator will re-grow the cache by fetching more slabs from free page memory. All of this is completely transparent to the file-system driver.
Creation and management of slab pages
Each slab of pages has an associated slab management structure. The slab_t struct is defined in /usr/src/linux/mm/slab.c has the following format:
Where:
|
The slab_t structure is immediately followed by an array of kmem_bufctl_ts. There is one of these for for each object in the slab. The purpose of the kmem_bufctl_t array is to keep track of allocated and free objects within the slab. Implementations of slab allocators often have these structs containing forward/backward pointers, reference counts, pointers to the slabs they're in, etc. However, the Linux 2.4 implementation of virtual memory uses the page struct to keep track of which caches and slabs cache objects belong to, so the kmem_bufctl_t can be very simple. In fact, it is a single integer index. This array, along with the slab_t>-free member in effect implement the cache heap.

Figure 3. Off slab slab_t structure with 8 kmem_bufctl_t structs
The above figure shows the bufctl in use after several slab objects have been allocated and deleted. The next free object to be allocated will be in slot 3. Then free will assume the value in slot 3's bufctl or the value 1. So slot 1 will get the following next object allocated. Etc.
A slab_t structure may reside in the slab it describes or it may be allocated as part of the kmem_cache cache, depending on the size of the objects in the slab. Figure 2 shows an on-slab slab_t structure, where figure 3 shows on off-slab slab_t. The assumption is that if the objects in the cache are large, it is better to place the slab_t off-slab for less fragmentation and better packing of the objects. The rule is if the object size is greater than 1/8th the page size, the slab_t is allocated off-slab
The number of objects per slab (and therefore the number of pages per slab) is calculated when the cache is initially created. The driver or kernel module that asks for the cache only supplies the size of the objects to be cached, and should never have to care about how many objects there are in the cache.
The basic algorithm for calculating the number of objects in a slab is: First see how many objects fit on one page. If the left over is less than 1/8th the total slab size, then the fragmentation (or wastage) is acceptable and we're done.
If there is too much left over empty space on the first try, we try again with a larger slab. Slabs are grown in multiples of powers of 2 of the page size (kept internally as gfporder). So the first try, gfporder is 0 for 1 page, second try gfporder is 1 for 2 pages, next try gets 4 pages, next gets 8 pages, etc.
The actual algorithm used is obviously a little more complex than what I've just described. There are a lot of other factors and limits to take into account, such as whether the slab_t is kept on the slab (so its size figures into the equation), cache coloring, L1 cache alignment of the objects to reduce cache hits, etc. But the general idea is to find a balance of the number of pages in the slab and the number of objects in the slab for most efficient use of the space available.
The cache
A cache is a group of one or more slabs of object type. The structure that maintains each cache is the kmem_cache_t. The following is a partial list of important fields within the kmem_cache_t structure. This is not a complete structure, as all SMP and statistics related fields were removed for brevity. The full structure is defined in /usr/src/linux/mm/slab.c.
Where: Where:
|

Figure 4. kmem_cache_t and its associated slabs
TODO: Add a couple of paragraphs describing use here
Cache Coloring of Slabs
Cache Coloring is a method to ensure that access to the slabs in kernel memory make the best use of the processor L1 cache. This is a performance tweak to try to ensure that we take as few cache hits as possible. Since slabs begin on page boundaries, it is likely that the objects within several different slab pages map to the same cache line, called 'false sharing'. This leads to less than optimal hardware cache performance. By offsetting each beginning of the first object within each slab by some fragment of the hardware cache line size, processor cache hits are reduced.

Figure 5. Cache Coloring within slabs
The kmalloc() Interface
Of course, there are times when a kernel module or driver needs to allocate memory for an object that doesn't fit one of the uniform types of the other caches, for example string buffers, one-off structures, temporary storage, etc. For those instances drivers and kernel modules use the kmalloc() and kfree() routines. The Linux kernel ties these calls into the slab allocator too.
On initialization, the kernel asks the slab allocator to create several caches of varying sizes for this purpose. Caches for generic objects of 32, 64, 128, 256, all the way to 131072 bytes are created for both the GFP_NORMAL and GFP_DMA zones of memory.

Figure 6: Use of the cache_sizes array
When a kernel module or driver needs memory, the cache_sizes array is searched to find the cache with the size appropriate to fit the requested object. For example, if a driver requests 166 bytes of GFP_NORMAL memory through kmalloc(), an object from the 256 byte cache would be returned.
When kfree() is called to release the object, the page the object resides in is calculated. Then the page struct for that page is referenced from mem_map (which was set up to point to our kmem_cache_t and slab_t pointers when the slab was allocated). Since we now have the slab and cache for the object, we an release it with __kmem_cache_free().
TODO: Add a summary paragraph here



