In-depth: The New MorphOS Memory System

From MorphOS Library

Originally published at: http://morphos-team.net/tlsf.html

Author: Harry Sintonen



Foreword

Many parts of AmigaOS were designed 25 years ago. At the time system resources were scarce and processors were relatively slow.

For Exec a simple and in many cases fast memory management algorithm was chosen: First Fit algorithm. It gives a very fast allocation speed in non-fragmented scenarios. Memory deallocation isn't very efficient however. The major problem is memory fragmentation: First Fit routine will get exponentially slow when fragmentation grows. Over time the fragmentation can get so severe that the system is visibly laggy. Finally, First Fit routine itself causes yet more fragmentation by not returning the best but first fit.

MorphOS inherited the memory management algorithm from AmigaOS. It worked quite well, but it soon became clear that it had some serious problems. Especially longer sessions (several days) of using complex applications (for example IBrowse) would easily slow the system down a lot. So First Fit had to go and something better had to replace it.

I had three design goals in mind when developing the new memory system:

1. It must be compatible with the old one, as much as possible. Legacy applications should continue to work as before. 2. It should reduce the effects of memory fragmentation. 3. It should reduce memory fragmentation. An as large as possible contiguous memory block should be available.


Compatibility

The first requirement was clearly the most critical one as MorphOS traditionally wants to maintain as much compatibility as possible, while introducing new advanced things, too.

It also proved to be the most challenging requirement of all to fulfill. AmigaOS-compatible memory API is cluttered with obscurities such as AllocAbs(), AvailMem() and AddMemList(), freeing partial memory blocks, the MEMF_REVERSE flag and documented memory alignment characteristics.

The challenge was to introduce a faster memory system without breaking these functions. For example there is a common code sequence to obtain aligned memory blocks:

void *allocaligned(ULONG size, ULONG flags, ULONG align)
{
  void *ptr = AllocMem(size + align - 1, flags & ~MEMF_CLEAR);
  if (ptr)
  {
    ULONG alignptr = ((ULONG) ptr + align - 1) & (ULONG) -align;
    Forbid();
    FreeMem(ptr, size + align - 1);
    ptr = AllocAbs(size, (APTR) alignptr);
    Permit();
    if (ptr && (flags & MEMF_CLEAR))
      memset(ptr, 0, size);
  }
  return ptr;
}


For this to continue to work properly the new system must implement AllocAbs() properly and adhere to the original alignment specifications.

Another example is a routine which allocates the largest memory block:

void *alloclargest(ULONG flags)
{
  ULONG largest;
  void *ptr;
  Forbid();
  largest = AvailMem(flags | MEMF_LARGEST);
  ptr = AllocMem(largest, flags);
  Permit();
  return ptr;
}


For this to work the largest memory block returned by AvailMem() must always be available for allocation as well.

AddMemList() is a somewhat uncommon feature of the memory system. This function can be used to add memory to the system runtime.

All these are implemented and are functioning correctly in the new memory system.

The new system is even compatible with some rather nasty tricks, such as assuming that AllocVec() keeps the allocation size at ptr-4. This is often used to implement realloc() like functionality for AllocVec()ed memory. Supporting this was no problem since it would not affect any other things adversely.

We clearly understand that code using such tricks is seriously broken, but we also understand that often there is no way to fix these applications, either. So for the end user it is better to support such "hacks", rather than to fail miserably. Developers can use tools such as MungWall to detect these hacks, but the end users should not need to worry about such things!

In the end only two things had to go:

First, FreeMem() can no longer be used to free partial memory blocks. This feature was a side effect of FreeMem() internally calling Deallocate(), which in turn is guaranteed to support partial free. With the new memory system Deallocate() is no longer called. Luckily even the AmigaOS SDKs have since a long time discouraged partial block FreeMem, so this is no big problem I believe.

Second, MEMF_REVERSE no longer has any effect on the allocation. The flag is quietly ignored. Frankly I believe that adding such a flag to the memory system was a mistake to begin with. In MorphOS context it has no application anyway, so I don't believe the removal has any adverse effects.


Reducing Effects of Fragmentation

Fragmentation happens, there is no way around it. Over time the memory will fragment. The trick is to reduce the adverse effect it has on system performance.

The choice of algorithm was critical here. The Two Level Segregated Fit (http://rtportal.upv.es/rtmalloc/) algorithm provides a guaranteed constant O(1) allocation/deallocation cost regardless of the memory fragmentation.

This means that regardless of the memory fragmentation allocations and deallocation will run at the same speed. Even after say 5 years uptime.


Reducing Memory Fragmentation

The choice of algorithm is critical here, as well. For example the original First Fit algorithm generates a lot of fragmentation over time. To give fast allocation speeds it returns the first memory block fitting the requirements.

The new algorithm should instead return the best possible memory chunk, making sure that fragmentation only occurs when absolutely necessary.

The Two Level Segregated Fit algorithm excels here too, giving average fragmentation lower than 15%.


Implementation

MorphOS 2.0 and later has a pluggable memory interface. Basically different allocator schemes can be selected at boot time. For now the possible options are First Fit (same as in AmigaOS and MorphOS 1.x) and Two Level Segregated Fit (default). The user can configure the desired memory system by adjusting the MorphOS boot command in OpenFirmware.

The TLSF algorithm is remarkably simple:

The algorithm uses a segregated fit mechanism to implement a good fit policy. The segregated fit mechanism uses an array of free lists, with each array holding free blocks within a size class. The classes are powers of two: 16, 32, 64 and so on. For performance and fragmentation reduction reasons the array has two levels. In the MorphOS implementation each list is divided to 32 sub-divisions. This combined with other implementation specifics allows for a maximum allocation size of 2GB. Each array of lists has a bitmap to mark which lists are occupied with empty blocks and which are empty (all memory allocated). The free blocks themselves hold information about the block, similar to the original First Fit routine. For additional block coalesce performance during the memory deallocation a back pointer to previous free block is maintained as well.

In order to allocate a memory block the allocation size is split between the two indexes. These indexes are used to find a free block, which now is an O(1) operation. The found block is removed from the free list and marked as used. If the free block found is larger than the allocation size, the block is split. The size of the remaining block is again converted to two indexes. The new free block is inserted to the correct position indicated by the indexes.

To free memory the block size is obtained from the block header. The block is merged with the previous and next block, if possible. If merging is possible the other block is marked as used and removed. The size of the final resulting block is again converted to two indexes. The new free block is merged as free and inserted into the correct position indicated by the indexes.

The actual implementation adds a couple of things the original algorithm didn't have: maintaining the remaining free size (for AvailMem), possibility to allocate the largest possible free block and allocating a memory block at an absolute address.

In addition the characteristics of the TLSF allocator give a possibility to automatically detect wrong deallocation sizes passed to the memory freeing routines. This gives extra safety in form of rejecting obviously illegal calls. For application developers it gives extra debug so that they can identify and fix the bug early.


The TLSF memory system fills all the design goals: It is very compatible with even the oldest applications. It maintains good performance regardless of the memory fragmentation. It reduces memory fragmentation.



References

http://rtportal.upv.es/rtmalloc/files/ecrts04_tlsf.pdf

http://en.wikipedia.org/wiki/Dynamic_memory_allocation