www.digitalmars.com
Last update Tue Jun 6 16:38:20 2006

Memory Management

Any non-trivial program needs to allocate and free memory. Memory management techniques become more and more important as programs increase in complexity, size, and performance. D offers many options for managing memory.

The three primary methods for allocating memory in D are:

  1. Static data, allocated in the default data segment.
  2. Stack data, allocated on the CPU program stack.
  3. Garbage collected data, allocated dynamically on the garbage collection heap.
The techniques for using them, as well as some advanced alternatives are:

Strings (and Array) Copy-on-Write

Consider the case of passing an array to a function, possibly modifying the contents of the array, and returning the modified array. Since arrays are passed by reference, not by value, a crucial issue is who owns the contents of the array? For example, a function to convert an array of characters to upper case:
char[] toupper(char[] s)
{
    int i;

    for (i = 0; i < s.length; i++)
    {
	char c = s[i];
	if ('a' <= c && c <= 'z')
	    s[i] = c - (cast(char)'a' - 'A');
    }
    return s;
}
Note that the caller's version of s[] is also modified. This may be not at all what was intended, or worse, s[] may be a slice into a read-only section of memory.

If a copy of s[] was always made by toupper(), then that will unnecessarily consume time and memory for strings that are already all upper case.

The solution is to implement copy-on-write, which means that a copy is made only if the string needs to be modified. Some string processing languages do do this as the default behavior, but there is a huge cost to it. The string "abcdeF" will wind up being copied 5 times by the function. To get the maximum efficiency using the protocol, it'll have to be done explicitly in the code. Here's toupper() rewritten to implement copy-on-write in an efficient manner:

char[] toupper(char[] s)
{
    int changed;
    int i;

    changed = 0;
    for (i = 0; i < s.length; i++)
    {
	char c = s[i];
	if ('a' <= c && c <= 'z')
	{
	    if (!changed)
	    {   char[] r = new char[s.length];
		r[] = s;
		s = r;
		changed = 1;
	    }
	    s[i] = c - (cast(char)'a' - 'A');
	}
    }
    return s;
}
Copy-on-write is the protocol implemented by array processing functions in the D Phobos runtime library.

Real Time

Real time programming means that a program must be able to guarantee a maximum latency, or time to complete an operation. With most memory allocation schemes, including malloc/free and garbage collection, the latency is theoretically not bound. The most reliable way to guarantee latency is to preallocate all data that will be needed by the time critical portion. If no calls to allocate memory are done, the GC will not run and so will not cause the maximum latency to be exceeded.

Smooth Operation

Related to real time programming is the need for a program to operate smoothly, without arbitrary pauses while the garbage collector stops everything to run a collection. An example of such a program would be an interactive shooter type game. Having the game play pause erratically, while not fatal to the program, can be annoying to the user. There are several techniques to eliminate or mitigate the effect:
  • Preallocate all data needed before the part of the code that needs to be smooth is run.
  • Manually run a GC collection cycle at points in program execution where it is already paused. An example of such a place would be where the program has just displayed a prompt for user input and the user has not responded yet. This reduces the odds that a collection cycle will be needed during the smooth code.
  • Call std.gc.disable() before the smooth code is run, and std.gc.enable() afterwards. This will cause the GC to favor allocating more memory instead of running a collection pass.

    Free Lists

    Free lists are a great way to accelerate access to a frequently allocated and discarded type. The idea is simple - instead of deallocating an object when done with it, put it on a free list. When allocating, pull one off the free list first.
    class Foo
    {
        static Foo freelist;		// start of free list
    
        static Foo allocate()
        {   Foo f;
    
    	if (freelist)
    	{   f = freelist;
    	    freelist = f.next;
    	}
    	else
    	    f = new Foo();
    	return f;
        }
    
        static void deallocate(Foo f)
        {
    	f.next = freelist;
    	freelist = f;
        }
    
        Foo next;		// for use by FooFreeList
        ...
    }
    
    void test()
    {
        Foo f = Foo.allocate();
        ...
        Foo.deallocate(f);
    }
    
    Such free list approaches can be very high performance.

    Reference Counting

    The idea behind reference counting is to include a count field in the object. Increment it for each additional reference to it, and decrement it whenever a reference to it ceases. When the count hits 0, the object can be deleted.

    D doesn't provide any automated support for reference counting, it will have to be done explicitly.

    Win32 COM programming uses the members AddRef() and Release() to maintain the reference counts.

    Explicit Class Instance Allocation

    D provides a means of creating custom allocators and deallocators for class instances. Normally, these would be allocated on the garbage collected heap, and deallocated when the collector decides to run. For specialized purposes, this can be handled by creating NewDeclarations and DeleteDeclarations. For example, to allocate using the C runtime library's malloc and free:
    import std.c.stdlib;
    import std.outofmemory;
    import std.gc;
    
    class Foo
    {
        new(size_t sz)
        {
    	void* p;
    
    	p = std.c.stdlib.malloc(sz);
    	if (!p)
    	    throw new OutOfMemoryException();
    	std.gc.addRange(p, p + sz);
    	return p;
        }
    
        delete(void* p)
        {
    	if (p)
    	{   std.gc.removeRange(p);
    	    std.c.stdlib.free(p);
    	}
        }
    }
    
    The critical features of new() are: The critical features of delete() are:

    If memory is allocated using class specific allocators and deallocators, careful coding practices must be followed to avoid memory leaks and dangling references. In the presence of exceptions, it is particularly important to practice RAII to prevent memory leaks.

    Custom allocators and deallocators can be done for structs and unions, too.

    Mark/Release

    Mark/Release is equivalent to a stack method of allocating and freeing memory. A 'stack' is created in memory. Objects are allocated by simply moving a pointer down the stack. Various points are 'marked', and then whole sections of memory are released simply by resetting the stack pointer back to a marked point.
    import std.c.stdlib;
    import std.outofmemory;
    
    class Foo
    {
        static void[] buffer;
        static int bufindex;
        static const int bufsize = 100;
    
        static this()
        {   void *p;
    
    	p = malloc(bufsize);
    	if (!p)
    	    throw new OutOfMemoryException;
    	std.gc.addRange(p, p + bufsize);
    	buffer = p[0 .. bufsize];
        }
    
        static ~this()
        {
    	if (buffer.length)
    	{
    	    std.gc.removeRange(buffer);
    	    free(buffer);
    	    buffer = null;
    	}
        }
    
        new(size_t sz)
        {   void *p;
    
    	p = &buffer[bufindex];
    	bufindex += sz;
    	if (bufindex > buffer.length)
    	    throw new OutOfMemory;
    	return p;
        }
    
        delete(void* p)
        {
    	assert(0);
        }
    
        static int mark()
        {
    	return bufindex;
        }
    
        static void release(int i)
        {
    	bufindex = i;
        }
    }
    
    void test()
    {
        int m = Foo.mark();
        Foo f1 = new Foo;		// allocate
        Foo f2 = new Foo;		// allocate
        ...
        Foo.release(m);		// deallocate f1 and f2
    }
    
  • The allocation of buffer[] itself is added as a region to the GC, so there is no need for a separate call inside Foo.new() to do it.

    RAII (Resource Acquisition Is Initialization)

    RAII techniques can be useful in avoiding memory leaks when using explicit allocators and deallocators. Adding the auto attribute to such classes can help.

    Allocating Class Instances On The Stack

    Allocating class instances on the stack using an overridden new and delete is useful for temporary objects that are to be automatically deallocated when the function is exited. No special handling is needed to account for function termination via stack unwinding from an exception. To work, they must not have destructors, since such a destructor would never get called.

    To have a class allocated on the stack that has a destructor, this is the same as a declaration with the auto attribute. Although the current implementation does not put such objects on the stack, future ones can.

    import std.c.stdlib;
    
    class Foo
    {
        new(size_t sz, void *p)
        {
    	return p;
        }
    
        delete(void* p)
        {
    	assert(0);
        }
    }
    
    void test()
    {
        Foo f = new(alloca(Foo.classinfo.init.length)) Foo;
        ...
    }
    

    Allocating Uninitialized Arrays On The Stack

    Arrays are always initialized in D. So, the following declaration:
    void foo()
    {   byte[1024] buffer;
    
        fillBuffer(buffer);
        ...
    }
    
    will not be as fast as it might be since the buffer[] contents are always initialized. If careful profiling of the program shows that this initialization is a speed problem, it can be eliminated using a VoidInitializer:
    void foo()
    {   byte[1024] buffer = void;
    
        fillBuffer(buffer);
        ...
    }
    
    Uninitialized data on the stack comes with some caveats that need to be carefully evaluated before using:

    Interrupt Service Routines

    When the garbage collector does a collection pass, it must pause all running threads in order to scan their stacks and register contents for references to GC allocated objects. If an ISR (Interrupt Service Routine) thread is paused, this can break the program.

    Therefore, the ISR thread should not be paused. Threads created with the std.thread functions will be paused. But threads created with C's _beginthread() or equivalent won't be, the GC won't know they exist.

    For this to work successfully: