Python Memory Management

All allocating functions belong to one of the three allocation domains:

  • Raw domain (PYMEM_DOMAIN_RAW): Intended for allocating memory for general-purpose memory buffers where the allocation must go to the system allocator or where the allocator can operate without the GIL. The memory is requested directly to the system.

  • Object domain (PYMEM_DOMAIN_OBJ): Intended for allocating memory belonging to Python objects. The memory is taken from the Python private heap.

  • "Mem" domain (PYMEM_DOMAIN_MEM): Intended for allocating memory for Python buffers and general-purpose memory buffers where the allocation must be performed with the GIL held.

Allocation interface:

All allocation domains implement _Alloc(size_t size), _Calloc(size_t nelem, size_t elsize), _Realloc(void *ptr, size_t new_size) and _Free(void *ptr).

Allocators used:

Configuration
Name
PyMem_RawMalloc
PyMem_Malloc
PyObject_Malloc

Release build

"pymalloc"

malloc

pymalloc

pymalloc

Debug build

"pymalloc_debug"

malloc + debug

pymalloc + debug

pymalloc + debug

Release build, without pymalloc

"malloc"

malloc

malloc

malloc

Debug build, without pymalloc

"malloc_debug"

malloc + debug

malloc + debug

malloc + debug

The CPython Memory Allocator (pymalloc)

The CPython allocator is similar to the system allocator but differs in the following ways:

  • It has a fixed size, handling requests up to 256 KiB on 32-bit platforms and 1MiB on 64-bit platforms. Requests exceeding 256KB are managed by the system allocator.

  • It utilizes the Global Interpreter Lock (GIL) for thread safety instead of the system's thread-safety mechanisms.

  • It allocates by mmap rather than heap allocation.

Arenas

CPython creates arenas of 256 KiB on 32-bit platforms and 1MiB on 64-bit platforms to align with the system page size. Arenas are allocated against the system heap with mmap() on systems supporting anonymous memory mappings.

Fields and Purposes of `arenaobject`

Field
Type
Purpose

address

uintptr_t

Memory address of the arena

pool_address

block *

Pointer to the next pool to be carved off for allocation

nfreepools

uint

The number of available pools in the arena (free pools plus never-allocated pools)

ntotalpools

uint

The total number of pools in the arena, whether or not available

freepools

pool_header *

Singly linked list of available pools

nextarena

arena_object *

Next arena1^1

prevarena

arena_object *

Previous arena2^2

  1. If the arena is unallocated, the nextarena field is used to link all the unallocated arenas into a singly linked list, maintained by the global variable unused_arena_objects.

  2. If the arena is allocated, both the nextarena and prevarena fields are used to link the arena into a doubly linked list called usable_arenas. This list is maintained in increasing order of nfreepools values.

Pools

Pools are created for block sizes up to 512 bytes. For a given request, the size of the block issued is rounded up to 8k8k bytes in 32-bit systems and 16k16k bytes in 64-bit systems, where kk is the size class index.

When no available pools are available for the requested size class index in a arena, a new one is provisioned. Arenas have a high-water mark to index how many pools have been provisioned. The high-water mark sits at the last allocated pool.

Pools have three possible states: Full, Used and Empty.

Fields and Purposes of pool_header

Field
Type
Purpose

ref

uint

Number of currently allocated blocks in this pool

freeblock

block *

Pointer to this pool’s free list head

nextpool

pool_header *

Pointer to the next pool of this size class

prevpool

pool_header *

Pointer to the previous pool of this size class

arenaindex

uint

Singly-linked list of available pools

szidx

uint

Size class index of this pool

nextoffset

uint

Number of bytes to unused block

maxnextoffset

uint

Maximum number that nextoffset can be until pool is full

Pools of the same szidx are linked into a doubly linked list.

Pool tables: For an index of i,usedpools[i + i] points to the header of a list of all partially used pools that have the size index for that size class.

  • When a pool becomes full, it’s unlinked from its usedpools[] list.

  • If a full pool has a block freed, then the pool back is put back to the front of usedpools[].

  • On transition to empty, a pool is unlinked from its usedpools[] list and linked to the front of its arena’s singly linked freepools list.

Blocks

  • Within a pool, blocks of fixed size class can be allocated and freed.

  • Available blocks within a pool are listed in the singly linked list freeblock. When a block is freed, it’s inserted at the front of the freeblock list.

  • When a pool is initialized, only the first two blocks are linked within the freeblock list.

  • As long a pool is in the used state, there will be a block available for allocating.

Block Allocation API

When a block of memory is requested by a memory domain that uses pymalloc, pymalloc_alloc(void *ctx, size_t nbytes) is called.

static inline void*
pymalloc_alloc(void *ctx, size_t nbytes)
{
#ifdef WITH_VALGRIND
    if (UNLIKELY(running_on_valgrind == -1)) {
        running_on_valgrind = RUNNING_ON_VALGRIND;
    }
    if (UNLIKELY(running_on_valgrind)) {
        return NULL;
    }
#endif

    if (UNLIKELY(nbytes == 0)) {
        return NULL;
    }
    if (UNLIKELY(nbytes > SMALL_REQUEST_THRESHOLD)) {
        return NULL;
    }

    uint size = (uint)(nbytes - 1) >> ALIGNMENT_SHIFT;
    poolp pool = usedpools[size + size];
    block *bp;

    if (LIKELY(pool != pool->nextpool)) {
        /*
         * There is a used pool for this size class.
         * Pick up the head block of its free list.
         */
        ++pool->ref.count;
        bp = pool->freeblock;
        assert(bp != NULL);

        if (UNLIKELY((pool->freeblock = *(block **)bp) == NULL)) {
            // Reached the end of the free list, try to extend it.
            pymalloc_pool_extend(pool, size);
        }
    }
    else {
        /* There isn't a pool of the right size class immediately
         * available:  use a free pool.
         */
        bp = allocate_from_new_pool(size);
    }

    return (void *)bp;
}

Using the Python Debug API

sys._debugmallocstats() to get:

  • The number of blocks in use for each of the size class pools.

  • The number of arenas allocated and reclaimed along with the total number of blocks used.

The Object and PyMem Memory Allocation Domains

Beyond Python objects, the object allocator is also used for compiler, AST, parser, and evaluation loop.

An excellent example of the object memory allocator in use is the PyLongObject (int) type constructor, PyLong_New():

  • When a new int is constructed, memory is allocated from the object allocator.

  • The size of the request is the size of the PyLongObject struct plus the amount of memory required to store the digits.

[!note]

The number 12378562834 in Python would be represented as the list of digits [1,2,3,7,8,5,6,2,8,3,4], which is different from C.

PyLongObject *
_PyLong_New(Py_ssize_t size)
{
    assert(size >= 0);
    PyLongObject *result;
    if (size > (Py_ssize_t)MAX_LONG_DIGITS) {
        PyErr_SetString(PyExc_OverflowError,
                        "too many digits in integer");
        return NULL;
    }
    /* Fast operations for single digit integers (including zero)
     * assume that there is always at least one digit present. */
    Py_ssize_t ndigits = size ? size : 1;
    /* Number of bytes needed is: offsetof(PyLongObject, ob_digit) +
       sizeof(digit)*size.  Previous incarnations of this code used
       sizeof() instead of the offsetof, but this risks being
       incorrect in the presence of padding between the header
       and the digits. */
    result = PyObject_Malloc(offsetof(PyLongObject, long_value.ob_digit) +
                             ndigits*sizeof(digit));
    if (!result) {
        PyErr_NoMemory();
        return NULL;
    }
    _PyLong_SetSignAndDigitCount(result, size != 0, size);
    _PyObject_Init((PyObject*)result, &PyLong_Type);
    /* The digit has to be initialized explicitly to avoid
     * use-of-uninitialized-value. */
    result->long_value.ob_digit[0] = 0;
    return result;
}

The definition of PyLongObject can be expanded as follows:

struct PyLongObject {
    PyObject ob_base;
    struct _PyLongValue {
        uintptr_t lv_tag; /* Number of digits, sign, and flags */
        digit ob_digit[1];
    } long_value;
};

Therefore, the size of the memory requested includes the overhead of ob_base and lv_tag, as well as the actual content of the digits.

Last updated