Exec Lists
From MorphOS Library
Grzegorz Kraszewski
Contents
Introduction
A list is the simplest dynamic data structure. An array is even simplier, but it is not dynamic. Adding a single item to an array requires reallocation of memory and copying the whole old contents, similarly for removing. Inserting and removing elements to a list is very fast operation and its computational costs do not increase with number of items. Because of this, list is the basic structure used in exec.library, the MorphOS kernel. One could say the Exec is built on lists. Lists are also used through all the system to manage processes, windows, screens, devices, DOS volumes, fonts, libraries and more. Of course lists are also used by applications to manage dynamic data. Many more sophisticated data structures are built on lists. For all these reasons understanding lists and their MorphOS flavour is essential for every programmer.
Lists may be divided to intrusive and nonintrusive ones. An intrusive list is one requiring that every list item contains a part called a node. The node is then used for linking items into list. A nonintrusive list creates nodes itself. Both kinds have their advantages. Exec lists are intrusive. Why? For a nonintrusive list adding an item means allocating memory for its node. Exec lists are often used at really low levels of the system, like interrupt handling, process scheduler or input/output hardware devices. Calling memory allocator from there is unacceptable. Also error handling would become a nightmare (every memory allocation can fail...). In some parts of the system lists also have to be extremely fast. Memory allocation is a complex operation and can't be expected to finish in a few processor cycles. Of course on higher levels of the system, intrusiveness has more disadvantages than advantages. For example an item of intrusive list cannot be added to more than one list. Then high level components (for example MUI List class) may be nonintrusive ones.
From a Plain List to Exec List
The simplest form of list is a unidirectional one. Node of every item consists only of pointer to the next element. The list is identified by keeping a pointer to its first item. Unidirectional lists however are used only in special cases. While inserting an element at list start is easy, inserting at end requires traversing the whole list. Time of this operation increases linearly with number of items. It is also not possible to traverse the list in the backward direction. Then bidirectional lists are much more common. Exec lists are bidirectional. Every node contains two pointers: one to the next element and one to the previous element. For the first element pointer to the previous one is NULL. For the last one, pointer to the next one is NULL. If a list user keeps pointers to both the first and the last element, she has symmetrical access to the start and the end of the list.
While the idea is still simple, we have to complicate it a bit. As mentioned before, the list user keeps track of it by maintaining two pointers: to the first and the last item. These pointers will be modified every time an item is added at the start or end of the list respectively. But what if a list has multiple users not knowing about each other? Adding an item at the start or at the end would require pointer update for all the users. To solve the problem, two artificial items are introduced: a list head and a list tail. Those two elements do not carry payload, only consist of a node. Their location does not change during the list lifetime so they work as anchors. Inserting an element at list start is in fact inserting it between the list head and the former first element. Similarly element inserted at end is in fact inserted between the former last element and the list tail. As pointers to the head and the tail are constant, they may be shared between multiple users.
Exec lists combine the list head and the list tail into one structure, named list header. As the Exec was designed in the days when computer memory was counted in kilobytes rather than gigabytes, designers saw a way to save a few bytes. The "previous" field of list head node always contains NULL. Similarly the "next" field of the list tail always contains NULL. Then they can be merged into one. That is why the list header contains only three pointers instead of four. For the C language the list header is defined as struct List and lists are commonly referenced by a pointer to it.
Exec List Elements: Node and Header
Exec defines two kinds of nodes: a full one and a minimal one. The minimal node consists only of pointers to the next and the previous item and is defined in C in <exec/nodes.h> header file as follows:
struct MinNode
{
  struct MinNode *mln_Succ;     // successor, the next item
  struct MinNode *mln_Pred;     // predecessor, the previous item
};
The full node has additional fields used in many system lists. These fields are: item name, item type and item priority.
struct Node
{
  struct Node* ln_Succ;     // successor
  struct Node* ln_Pred;     // predecessor
  UBYTE        ln_Type;     // item type
  BYTE         ln_Pri;      // item priority
  char*        ln_Name;     // item name
};
Additional node fields have meaning only for system lists and may be considered as part of payload. If we ignore C types, a Node is just a MinNode with three fields added at the end. Node priority may be used to implement ordered lists or queues. Exec.library provides functions for ordered item insertion.
As there are two kinds of nodes, there are also two types of list headers. Both are defined in <exec/lists.h>:
struct MinList
{
  struct MinNode* mlh_Head;      // pointer to the first real item
  struct MinNode* mlh_Tail;      // merged head "previous" and tail "next"
  struct MinNode* mlh_TailPred;  // pointer to the last real item
};
Header for full nodes has additional type field (used for system lists) and one byte padding to make the structure size even.
struct List
{
  struct Node* lh_Head;      // pointer to the first real item
  struct Node* lh_Tail;      // merged head "previous" and tail "next"
  struct Node* lh_TailPred;  // pointer to the last real item
  UBYTE        lh_Type;
  UBYTE        lh_pad;
};
Again, when one ignores C types, the List structure is MinList with two extra fields.
As Exec lists are intrusive, a custom item structure must contain MinNode (or Node if needed) as the first field. It must be a complete structure, not a pointer. Here is an example:
struct MyNode
{
  struct MinNode   Node;        // must be the first
  struct Whatever  Foobar;      // example payload fields
  ULONG            Something;
  char             MoreData[10];
  /* ... */
};
There is no limit on custom item size. As items are not copied, big items are manipulated with exactly the same speed as small ones.
List Initialization, Empty List Check
The list header, which is either List or MinList structure must be initialized before use. Please note well:
Looking at the diagrams above, we know how an empty list should look like:
An empty list contains only its head and tail, both merged into the header. When one splits them back in mind, initialization becomes obvious:
- The "next" field of head points to tail.
- The "prev" field of head is NULL.
- The "next" field of tail is NULL.
- The "prev" field of tail points to head.
Then the second and third operation merge into one, as fields are merged. Following code performs proper Exec list initialization. Note the address operators &, forgetting them is a common mistake:
struct MinList mylist; mylist.mlh_Head = (struct MinNode*)&mylist.mlh_Tail; mylist.mlh_Tail = NULL; mylist.mlh_TailPred = (struct MinNode*)&mylist.mlh_Head;
In case of full List, initialization is the same, just typecasts have Node instead of MinNode. List initialization is a common operation, so <exec/lists.h> defines a NEWLIST macro for it. The macro takes a pointer to List or MinList, so for the above example it would be called as follows:
NEWLIST(&mylist);
Empty list check is another common operation. It may be derived from the diagram above. One can check if a list is empty using four equivalent conditions:
if (mylist.mlh_Head->mln_Succ == NULL) /* list is empty */ if (mylist.mlh_Head == (struct MinNode*)&mylist.mlh_Tail) /* list is empty */ if (mylist.mlh_TailPred->mln_Pred == NULL) /* list is empty */ if (mylist.mlh_TailPred == (struct MinNode*)&mylist.mlh_Head) /* list is empty */
A IsListEmpty() macro defined in <exec/lists.h> uses the last condition, simplified by the fact that address of mlh_Head is the same as address of the whole MinList (a good compiler will optimize field reference out anyway).




