Glibc Malloc

This note is based on an excellent post titled "Understanding glibc malloc" from sploitfun. It discusses the fundamental concepts of glibc malloc, including arenas, chunks, and bins.

Arenas

The dynamically allocated memory in glibc is managed using arenas. When malloc is first called in a thread, an arena is created for dynamic memory allocation. However, if there are too many threads, this behavior changes, which will be addressed later. The arena created by the main thread is known as the main arena.

Arena Creation

Main Thread: Memory allocation follows a dynamic threshold strategy. For requests smaller than this threshold, memory is allocated using brk. For larger requests, allocation is handled through mmap. Initially, the threshold is set at 128KiB but adjusts dynamically based on the size of freed memory that are larger than 128KiB.

Other Threads: Regardless of the request size, memory allocation is consistently managed using mmap.

The size of an arena is set at 132 KiB. When an arena depletes its available space, it has the capability to expand by extending the top chunk.

For arenas instantiated using mmap, the allocation behavior is notable. For instance, even if a user request is merely for 1 KiB, a substantial 64 MiB of memory (in 64 bit systems) will be mapped. The initial 132 KiB of this mapped space constitutes the arena. This setup can be visualized as follows:

[  0 - 1KiB : User Request  ] [  1KiB - 132KiB : Unused Arena Space  ] [  132KiB - 64MiB : Mapped Memory Beyond Arena  ]

Limit of the Number of Arenas: For 64 bit systems, the maximum number of arenas (except the main arena) = 8 * number of cores.

heap_info (Heap Header): In a single thread arena, there can be multiple heaps, each with its own header. Initially, an arena starts with only one heap, but when this heap exhausts its space, additional non-contiguous heaps are mapped (mmap’d) to the arena.
malloc_state (Arena Header): Although an arena may consist of multiple heaps, it has only one arena header. This header stores essential information such as bin management, the top chunk, the last remainder chunk, and more.
malloc_chunk (Chunk Header): A heap is subdivided into chunks according to user requests. Each chunk is prefixed by its own header.

Main arena dont have multiple heaps and hence no heap_info structure. When main arena runs out of space, sbrk’d heap segment is extended (contiguous region) until it bumps into memory mapping segment.

Unlike thread arena, main arena’s arena header isn't part of sbrk’d heap segment. Its a global variable and hence is found in libc.so’s data segment.

In an arena with multiple heaps, each heap begins with a heap_info header. For the main heap, this heap_info is immediately followed by malloc_state, which manages the arena's information. The heap is then partitioned into chunks, each starting with a malloc_chunk header.

Chunks

Chunks are placed in the heap without padding. A chunk within a heap segment can be one of the following types: (1) Allocated chunk, (2) Free chunk, (3) Top chunk, or (4) Last Remainder chunk.

Allocated Chunks

Three parts: prev_size (optional), size with the three LSB, user data.

prev_size: If the previous chunk is free, this field contains the size of previous chunk. Else if previous chunk is allocated, this field contains previous chunk’s user data.
size: This field contains the size of this allocated chunk. Last 3 bits of this field contains flag information.
- PREV_INUSE (P) – This bit is set when previous chunk is allocated.
- IS_MMAPPED (M) – This bit is set when chunk is mmap’d.
- NON_MAIN_ARENA (N) – This bit is set when this chunk belongs to a thread arena.

Free Chunks

prev_size: No two free chunks can be adjacent together. When both the chunks are free, its gets combined into one single free chunk. Hence always previous chunk to this freed chunk would be allocated and therefore prev_size contains previous chunk’s user data.
size: This field contains the size of this free chunk.
fd: Forward pointer – Points to next chunk in the same bin (and NOT to the next chunk present in physical memory).
bk: Backward pointer – Points to previous chunk in the same bin (and NOT to the previous chunk present in physical memory).

Top Chunks

Chunk which is at the top border of an arena is called top chunk. It doesnt belong to any bin. Top chunk is used to service user request when there is NO free chunks, in any of the bins. If top chunk size is greater than user requested size, top chunk is split into two:

User chunk (of user requested size)
Remainder chunk (of remaining size)

The remainder chunk becomes the new top. If top chunk size is lesser than user requested size, top chunk is extended using sbrk (main arena) or mmap (thread arena) syscall to create new heaps.

If the top chunk is extended with new heaps, it now becomes a regular free chunk and the top chunk of this arena is at the extended heap.

Last Remainder Chunks

The "last remainder chunk" originates from the most recent partitioning of a small memory request. This chunk enhances locality of reference, meaning consecutive small allocations may be positioned near one another.

When a request for a small chunk cannot be fulfilled by either the small bin or the unsorted bin, the binmaps are scanned to identify the next largest non-empty bin. Through iterative splitting of this bin, the chunks are allocated in close proximity, thereby improving locality.

Bins

Bins are the freelist datastructures. They are used to hold free chunks. Based on chunk sizes, different bins are available: (1) Fast bin, (2) Unsorted bin, (3) Small bin, (4) Large bin.

Fast Bin: Stores small, recently freed chunks for quick allocation. Improves performance by avoiding full heap checks. Handle very small chunks (up to ~64 bytes). No coalescing; designed for speed.
Unsorted Bin: Temporary holding bin for recently freed chunks of any size, before they are coalesced or moved to small/large bins.
Small Bin: Contains small chunks of fixed sizes. Optimizes memory management by grouping similar-sized chunks. Handle small to medium chunks (up to ~512 bytes), with coalescing.
Large Bin: Holds larger chunks of varying sizes. Manages memory efficiently by keeping larger, irregularly sized chunks together. Handle large chunks (from ~512 bytes to several kilobytes or more), with coalescing, organized into size ranges.

malloc_state.fastbinsY: The array hold fast bins.
malloc_state.bins: This array hold unsorted, small and large bins. Totally there are 126 bins and its divided as follows: (1) Bin 1 - Unsorted bin (2) Bin 2 to Bin 63 – Small bin (3) Bin 64 to Bin 126 – Large bin

Fast Bins

Fast bins manage chunks ranging from 16 to 80 bytes across ten distinct bins. Each bin hosts a single linked list dedicated to free chunks of a specific size. The size increments between bins are consistent, at 8 bytes. For instance, the first fast bin (index 0) holds chunks of 16 bytes, while the second bin (index 1) manages chunks of 24 bytes, and so forth.

During the initialization of the malloc function, the maximum size for fast bins is set at 64 bytes instead of 80 bytes. Consequently, chunks sized between 16 to 64 bytes are classified as fast chunks by default.

When two free chunks are adjacent, they are not combined into a single chunk; this absence of coalescing can lead to external fragmentation but enhances the speed of the free operation.

Initially, the maximum size for fast bins and the bin indices are unpopulated, which means that if a user requests a fast chunk, the allocation request is handled by the small bin code rather than the fast bin code. Once the fast bins are populated, they can efficiently fulfill requests for fast chunks.

Upon freeing, a fast chunk is immediately added to the front of its corresponding bin list.

Unsorted Bins

When small or large chunk gets freed instead of adding them in to their respective bins, its gets added into unsorted bin. Then the recently freed chunks can be reused to make the allocation time speed up a little.

Small Bins

Cover chunk sizes that are exactly from 64 bytes (0x40) to 504 bytes (0x1F8). These chunks are allocated and deallocated faster than large bins, though not as quickly as fast bins.

There are 62 small bins, each linking chunks via doubly linked lists, differing from the single links of fast bins. Chunks are added at the front and removed from the back.

Similar to fast bins, the sizes of small chunks increase in increments of 8 bytes, and adjacent chunks cannot be merged.

When using malloc, all small bins start as NULL. Therefore, if a user requests a small chunk, the unsorted bin code, rather than the small bin code, attempts to fulfill this request.

When freeing a small chunk, it is necessary to check whether the adjacent chunks are free. If either the previous or next chunk is free, these are unlinked from their respective linked lists. The newly consolidated chunk is then added to the beginning of the unsorted bin’s list.

Large Bins

Chunks with sizes of 512 bytes or larger are defined as large chunks. These larger bins typically handle memory allocation and deallocation slower than their smaller counterparts.

There are 63 designated large bins, categorized as follows:

32 bins are spaced 64 bytes apart, managing chunks in sizes ranging incrementally by this measure. For instance, the first large bin (Bin 65) covers chunks from 512 to 568 bytes, while the second large bin (Bin 66) covers from 576 to 632 bytes, and so on.
16 bins manage chunks spaced 512 bytes apart.
8 bins oversee chunks spaced 4096 bytes apart.
4 bins handle chunks spaced 32768 bytes apart.
2 bins cover chunks spaced 262144 bytes apart.
1 bin is designated for chunks of any remaining sizes.

Unlike the bins for smaller chunks, the chunks within a large bin vary in size and are organized in descending order.

At initialization, all large bins start empty. If a user requests a large chunk, the system initially attempts to fulfill this using the code for the next largest bin. However, once a large bin contains chunks, it can service requests directly.

If the largest chunk in the bin is bigger than the requested size, the bin is searched from back to front to find an appropriately sized chunk. This chunk is then split into: (1) a user chunk of the requested size, which is returned to the user. (2) a remainder chunk of the remaining size, which is added to the unsorted bin.
If the largest chunk in a large bin is smaller than the requested size, the system will try to fulfill the request from the next largest non-empty bin. If a suitable chunk is found, it is split and given to the user. If no suitable bins are available, the request might be fulfilled using the top chunk.

The process for freeing large chunks is analogous to that for small chunks.

Tcache

Initial Memory Allocation: The first malloc call allocates memory for the tcache structure if it's not already initialized.
Freeing Memory: When memory is freed and its size is within the tcache range: Before tcache was introduced, freed memory was placed directly into a fastbin or the unsorted bin. After tcache was introduced, the freed memory chunk is placed into the corresponding tcache bin until the bin is full (default maximum of 7 chunks). If the tcache bin is full, the chunk is then placed in the fastbin or unsorted bin as per the old behavior.
Chunks stored in tcache are not coalesced (i.e., adjacent free chunks are not merged), and the in-use bit of the chunk is not cleared.
Allocating Memory (malloc): When requesting memory of a size that falls within the tcache range:
- From tcache: The allocator first attempts to retrieve a chunk from the corresponding tcache bin.
- If tcache is empty: The allocator will search the appropriate bin (fastbin, smallbin, unsorted bin) for a matching chunk. If a suitable chunk is found in one of these bins, it will be moved into the tcache until the tcache bin is full. The chunk is then taken from the tcache to fulfill the allocation request.

Tcache works because the bins (fastbins, smallbins, largebins, etc.) are part of an arena, and a single arena can be shared by multiple threads. Access to these bins requires locks to prevent race conditions, but this can become inefficient with many threads due to lock contention. Tcache, on the other hand, is stored in thread-local storage, thereby eliminating the need for locks in small allocations.

PreviousMiscellaneous Next八股总结

Last updated 10 months ago