The Standard Library


The Standard Library is an implementation of collection theory using AVL Trees. Much of the source code of this library is visible in the header files found in \IPlusPlus\hpp. The templates of the Standard Library are very clean and compact. They are easy to read and comprehend.

The theory of AVL Trees was invented by two Russian mathematicians G.M.Adel'son-Vel'skii and E.M.Landis. The manipulations to keep AVL Trees balanced were invented in 1962.

Finite, Ordered Set Theory

A base class Node can be defined such that tree balancing becomes non-generic.

struct Node
{
 Node* Left;          // A pointer to the left node.
 Node* Right;         // A pointer to the right node.
 Node* Parent;        // A pointer to the parent node.
 char Balance;        // Balance Factor

 Node();              // Creates a header node.

 Node(Node* Parent);  // Creates a normal node.

 bool IsHeader() const;

 void* operator new(size_t s);

 void operator delete(void* p);
};

With just this declaration, trees can be balanced. Note that only a single extra byte in the node is required to balance trees. The class Node is a normal structure and is not a generic (template) class. So AVL balancing is non-generic in nature. The data is assumed to be supplied by classes that derive from Node. The headers contain most of the source code of the library. Consult headers StandardKernel and StandardSet for declarations of Node and setNode.

The set node class is shown below.

template<class T>
struct setNode : public Node
{
 T Element;

 setNode(const T& ElementSet,
         Node* Parent) : Node(Parent), Element(ElementSet) {}

 operator T&() {return Element;}
};

Of course, this node is generic and it contains an instance of the class T - the element type. T must possess a copy constructor so that the element can be initialized.

The actual definition of a set contains only a small amount of data. A portion of the definition is shown below.

template <class T>
class set : public Semaphore
{
 public:

  typedef int (*keyCompare)(const T&,const T&);

 protected:

  Node Header;
  keyCompare Compare;

 public:

  // *** iterators ***

  typedef setIterator<T> iterator;

  typedef constSetIterator<T> const_iterator;

  // *** constructors, destructor, operators ***

  set(keyCompare C=compare) : Compare(C) {}

  ...
}

The only two fields are Header and Compare. Note that Header is of the base node class type - not the derived type setNode<T>. This is because a header node contains no data.

Some of the logic to add an entry to a set is contained in the function insert, shown as follows.

  iterator insert(const T& Element)
  {
   semaphore sem(*this);

   Node* RootNode = Header.Parent;

   if (RootNode == 0)
    {
     RootNode = new setNode<T>(Element,&Header);

     Header.Left = RootNode;      // Update LeftMost
     Header.Right = RootNode;     // Update RightMost
     Header.Parent = RootNode;    // Set Root Node

     return RootNode;
    }

   else
    {
     for (; ; )
      {
       int Result = Compare(Element,((setNode<T>*)RootNode)->Element);

       if (Result == 0)
        throw EntryAlreadyExistsException();
 
       else if (Result < 0)
        {
         if (RootNode->Left != 0)
          RootNode = RootNode->Left;
         else
          {
           Node* newNode = new setNode<T>(Element,RootNode);
           RootNode->Left = newNode;
           AdjustAdd(newNode);
           return newNode;
          }
        }

       else
        {
         if (RootNode->Right != 0)
          RootNode = RootNode->Right;
         else
          {
           Node* newNode = new setNode<T>(Element,RootNode);
           RootNode->Right = newNode;
           AdjustAdd(newNode);
           return newNode;
          }
        }
      }
    }
  } 

When the root node is null, a new root node is created. The leftmost and rightmost pointers in the header are updated as is Header.Parent - which is the new root node. Otherwise a binary search for a leaf node at which to insert the new entry is conducted. The function AdjustAdd is called to balance the tree after an insertion has taken place. The function detects the direction from which the insertion took place. It also ascends the set tree to obtain the header node, and when required updates the leftmost or rightmost pointers in the header.

The code to remove an element is shown as follows.

  void erase(const T& Element)
  {
   semaphore sem(*this);

   Node* RootNode = Header.Parent;

   for (; ; )
    {
     if (RootNode == 0) throw EntryNotFoundException();

     int Result = Compare(Element,((setNode<T>*)RootNode)->Element);

     if (Result < 0)
      RootNode = RootNode->Left;

     else if (Result > 0)
      RootNode = RootNode->Right;

     else // Item is found
      {
       if (RootNode->Left != 0 && RootNode->Right != 0)
        {
         Node* Replace = RootNode->Left;
         while (Replace->Right != 0) Replace = Replace->Right;
         SwapNodes(RootNode, Replace);
        }

       Node* Parent = RootNode->Parent;

       Unsigned From = (Parent->Left == RootNode) ? Direction::FromLeft : Direction::FromRight;
 
       if (RootNode->Left == 0)
        {
         if (Parent == &Header)
          Header.Parent = RootNode->Right;
         else if (From == Direction::FromLeft)
          Parent->Left = RootNode->Right;
         else
          Parent->Right = RootNode->Right;

         if (RootNode->Right != 0) RootNode->Right->Parent = Parent;
        }
       else
        {
         if (Parent == &Header)
          Header.Parent = RootNode->Left;
         else if (From == Direction::FromLeft)
          Parent->Left = RootNode->Left;
         else
          Parent->Right = RootNode->Left;

         if (RootNode->Left != 0) RootNode->Left->Parent = Parent;
        }

       AdjustRemove(Parent, From);
       delete (setNode<T>*)RootNode;
       break;
      }
    }
  }

A binary search for the node to be removed is conducted. When the node to be removed has two children, it is swapped with the preceeding leaf node. A parent of the node that was removed is passed to AdjustRemove and the tree is balanced.

In this introduction we have covered addition and removal of elements in a set. This is the foundation of set theory. The theory relies upon binary heaps for its memory mangement. The code uses non-generic tree balancing.

The class set is synchronized. The unsynchonized form of the class is Set. The uppercase version of the class has the same interface as the lowercase version of the class, so it is not separately documented. The same is true for other classes - see Optimized Classes.

The class tree

The class tree is second only to set in importance. It is one of the most significant classes in computer science. Trees derive from sets. A tree<K,T> is a set<T> with separate searches by the key class K. The class tree inherits iteration from the set class as well as the insertion algorithms. A separate erasure algorithm is present in tree.

The class map

The class map is also an AVL Tree. Perhaps the main feature of map is its subscript operator. The map template has two type parameters; a key type K and a data type T. The subscript operator provides indexed access to the object. Note that there is a trick to implementing the map subscript operator. A nested class called reference is used to delay the action to the map until all required data is present (that is, until both the K and the T are present). This precludes having to define a default constructor for T.

To assist in visualizing a map consider the class mapNode shown again below.

template<class K, class T>
struct mapNode : public Node
{
 K Key;
 T Data;

 mapNode(const K& KeySet,
         const T& DataSet,
         Node* Parent) : Node(Parent), Key(KeySet), Data(DataSet) {}

 operator const K&() const {return Key;}

 operator T&() {return Data;}
};

A map node has both a key and an instance of the data type within. Both K and T are assumed to have copy constructors. The nested reference class uses the method insert to insert a node when a T is assigned to a reference. Unordered maps use the same strategy for the subscript operator. Ultimately, operations like:

myMap[key] = dataType;

are possible both for ordered maps and for unordered maps.

The class Map is identical to map except that it is unsynchronized and therefore leaner and faster.

Unordered Set Theory

Whilst finite, ordered sets are important; let us not forget unordered sets. An ordered set is assumed to have a comparer which imposes an order on the elements. In contrast, an unordered set is assumed to have hasher and an equality comparer. No order is imposed upon an unordered set. The Standard Library unordered sets are purely an unordered concept and appear to be entirely new. Dropping the restriction of an order on the collection in favour of hashers and equality comparers is an important step.

Traditional hashing uses a linear table of buckets to perform its duties. With the Standard Library, this model has been supersceeded by binary hashing. Binary hashing uses an integer binary multitree to perform hashing. Thus, a node for unorderedSet is as follows.

template<class T>
struct unorderedSetNode : public Node
{
 Integer Index;
 T Element;

 unorderedSetNode(const T& ElementSet,
                  Integer IndexSet,
                  Node* Parent)
 : Node(Parent), Index(IndexSet), Element(ElementSet) {}

 operator T&() {return Element;}
};

In addition to the data of type T, there is a field called Index. This is the integer hash code of the data type T. This is used by the unordered set algorithms to form an integer binary multitree called the binary hash table. This approach reduces the theory of hash tables to the theory of binary trees - a major mathematical result. Binary hash tables never need to be rehashed (reallocated) and the range of the hash codes is the entire range of integer arithmetic (rather than modulo the size of a linear hash table). This means that there are far fewer collisions in a binary hash table than in a linear hash table. Linear hash tables perform at constant time whereas binary hash tables are much closer to O(1) (even though they are logarithmic in nature).

Other unordered concepts are dictionary and hashTable.

Summarizing

Perhaps the main difference between the Standard Library and STL is non-generic tree balancing. Whilst programming in C# it became clear that the amount of generic code could be reduced by defining a base node class that is non-generic and defining balancing in terms of this base node class. That concept was ported back to C++ when forming a collections library. It means that the lengthy balancing logic is generated only once - not once for each data type as for STL.

Another difference lies in the method of subscripting for the map class etc. In the absense of an indexer in C++, a nested helper class may be used to delay action to the map until both the key and data are present. STL performs technically incorrectly in this case. It requires the data type to have a default constructor and an assignment operator; whereas, the Standard Library requires the data type only to posses a copy constructor.

Yet another difference between the libraries lies in the allocation strategy. The standard C++ heap performs no garbage collection and retains all storage. The binary heap performs synchronous garbage collection; hence, releases storage when required. This implies that the Standard Library has a solid memory management strategy.

AVL Trees are superior to red/black trees in that the depth of the trees is smaller. This makes the searches faster. Perhaps red/black trees could be considered a dud invention given that they were developed 10 years after a superior solution. The balancing logic for red/black trees is more complex than AVL trees and it performs slightly slower. The searches are also slightly slower, but, in fairness, the difference is hardly noticable. One may ask, "Why increase the complexity of the code and simultaneously reduce the performance of the code"? This is what red/black trees do when compared to AVL trees. Basically, the Russians got it right 10 years earlier.

The Standard Library is multithreaded and makes use of an advanced (syncronously garbage collecting) heap. It uses a base node class to define non-generic tree balancing. This yields a perfect finite, ordered set theory, and subsequently tree and map theory.