Thursday, April 5, 2012

A Container that Uses an Allocator

In a previous post, I explained how to make your own allocators.  But one question still remains: how can I make a container that uses the amazing tools that are allocators?  Normally, a person would not worry about such things because the STL has almost every kind of container that will be needed, but it does lack a few; for example, an ordered dictionary is not part of the STL.

We won't be making a complicated ordered dictionary; instead, we will be coding a simple stack.  It will definitely need improvement before real-world use, but it will suffice for this learning experience.  And the best part is that I will be learning right along with you!  Don't worry, I've read up on the topic...  a lot...  haven't seen the light of day in a week!

I will be designing the stack and not just throwing in some allocators.  I hope that this approach will gradually introduce the concepts needed to understand how to use allocators and thus make them easier to understand.  If you just want to see how to use the things, skip to the end.


How do we start on this project?

I like to design top-down.  This means that I define the idea and what needs done, I break that down into sections to do, then I break those sections into subsections until I have a working product.

First we need to define what exactly we wish to achieve.  It's simple.  We are going to create a stack, push a lot of random values onto it, then pop them all off and print them out.  That's all.  Oh, and we don't know what type of value...  so we'll make sure it works with everything.

Let us divide the code into sections: included files, the stack code, and the main function.  Normally, the stack would go in a header file, but let's keep all this in one file instead.  What makes up the stack code?  Well, we have the main class, and we have the nodes that store each piece of stack data; so, let's make the stack class and stack node two separate sections.  So far, this is our code:
/*
 * Included files
 */

// nothing here


/*
 * Stack node
 */

// nothing here


/*
 * Stack class
 */

// nothing here


/*
 * Main function
 */

int main()
{
// nothing here
}
You may have noticed that I moved the order of the sections.  I moved stack node above stack class because the stack class uses the stack node, and in C++, something has to be declared before it can be used.


Since we already know what main does, let's code that.

I mentioned in above that the program needs to push a bunch of random values onto the stack (we don't know the type of those values), then pop the values off and print them.  That is easy to code.  Put the following into the main function:
/*
 * Main function
 */


int main()
{
typedef int value_type;

stack<value_type> test;
forint i = 0; i != 50; ++i )
test.push( rand() % 100 );
while( ! test.is_empty() )
cout << test.pop() << " ";
cout << endl;
}
From the above, we now have a better idea of what member functions the stack class needs to have: push(value), bool is_empty(), and value pop().  These can go in an "interface functions" subsection in class.  Also, we need the stack class to take a template parameter of the value it is storing.  Let's update the stack class section.
/*
 * Stack class
 */

template<typename value_type>
class stack
{
public:

/*
 * Interface functions
 */
// push a value onto the stack
void push( const value_type & value );

// Check if the stack is empty
bool is_empty() const;

// pop a value off of the stack
value_type pop();
};
Now that that's done, we can still see two things in the main function that give us a hint as to what goes in the other sections.  The rand function needs the cstdlib header, and the cout object needs the iostream header.  And since we are not prepending std:: to the standard library stuff, we need "using namespace std;" somewhere...  let's put it with the included files.
/*
 * Included files
 */

#include <cstdlib>
#include <iostream>

using namespace std;

What about the allocator?

We want our stack class to use an allocator for its memory management.  To do this, it needs to know what type of allocator it will be using.  Update the template parameters of the stack class.
template<
typename value_type,
typename allocator_type = allocator<value_type>
>
Notice that I gave the allocator type a default type.  We do not want the users of this stack to have specify allocator<value_type> (the most-used allocator) every time that they want to use it.

std::allocator (or allocator since we have "using namespace std;") is located in the memory header file.  Let's add the following line to our include section:
#include <memory>

Main is finished, so on to the stack class!

Before we begin working on the interface functions, let's make some typedefs.  This will save how much typing we have to do.  Let us add two typedef sections to the stack class: one for typedefs the public can see and one for internal use.  Since the typedefs will be used by the rest of the stack class, we need to have these typedef sections near the top.  While we're at it, let's replace some of the member function prototypes to use these typedefs; that way, if we mess up and need to change the types, it only takes one line of code being altered.

template<
typename value_type,
typename allocator_type = allocator<value_type>
>
class stack
{

/*
 * Internal typedefs
 */

// nothing here yet



public:

/*
 * Public typedefs
 */
typedef const value_type & const_reference;


/*
 * Interface functions
 */
// push a value onto the stack
void push( const_reference value );

// Check if the stack is empty
bool is_empty() const;

// pop a value off of the stack
value_type pop();
};
If you object to my typedef and believe it should have something to do with allocator_traits, read on.  If you want to take me at my word that the above is correct, just skip to the next section.

Notice that the const reference is defined in terms of value_type.  Some people may argue that it needs to be defined from the allocator's const_reference because the allocator may not use types that make sense.  If you were to define const_reference that way, then the code would be:
typedef typename allocator_traits<allocator_type>::const_reference const_reference;
If allocators truly were able to use odd pointer types or reference types (for example, a vector of available memory that returns an iterator as a pointer), then the very long typedef for const_reference would be the only correct way to do it.  However, in Working Draft, Standard for Programming Language C++, document #N3242, dated 2011-02-08, which is the latest C++ specification I can get my hands on without paying for it, table 27 does not work for anything other than normal value_types, pointers, and references:

"T,C - any non-const, non-reference object type"
"c - a dereferenceable pointer of type C*"
"X - an Allocator class for type T"
"a - values of type X&"
"n - a value of type XX::size_type"
"a.allocate(n) - [returns] X::pointer - ..."
"a.destroy(c) - ..."

Since you pass destroy the same value that you receive from allocate, then X::pointer must be the type that c is, but c is of type C*; therefore, X::pointer is C*.  Furthermore, "X::value_type - Identical to T", and since X::pointer is T* (T is the same type as C), the memory that the allocator manages must be able to handle a value (T), a pointer (T*), and a reference (T&).  Though my logic has not covered the fact that a reference is the same as T&, look at what a reference really is and then tell me it doesn't work if the value is T and the pointer is T*.

Also, it seems as though most of the containers in the STL library use the T, T&, and T* method of defining their value_type, pointer, and reference types.  The exception seems to be the unordered_* containers, but the draft I have has a possible error relating to those, so I don't want to use them as examples.

If what I just said gives you too much of a headache, don't worry about it.  The point was to prove that it is pointless to do a lot of typing and that you can just use "const value_type &" for the const_reference typedef.


Another prerequisite to the interface functions...

First, let us figure out how the stack class will work as a whole.  I think that the easiest way is to use a singly-linked list to store the data.  For this, we will need a pointer to the beginning of the list (node_ptr _start).  Let's add a subsection (right after the typedefs) for the data and put this in there.
... /*
 * Public typedefs
 */
...


private:

/*
 * Data
 */
// Pointer to the first node of the singly-linked list
node_ptr _start;


public:

/*
 * Interface functions
 */
...


There is one problem with the _start variable: node_ptr isn't defined.  Let's say that node_type is the type of the node; then, node_ptr is the same thing is node_type*.  Let's add two typedefs to the internal typedefs: node_type and node_ptr.  Why do we need a typedef for node_type?  Because the node needs to know the type of value it's storing and therefore must be a template.  I don't want to be typing some_node_name<value_type> every time I want to declare a node!

/*
 * Internal typedefs
 */

typedef stack_node<value_type> node_type;
typedef node_type * node_ptr;

For now, let's ignore the fact that node_type isn't defined.  We don't want to go off on a tangent.  We haven't finished with the stack class' interface functions yet.


Now let's focus on the interface functions.

Some people prefer to have the definitions of functions outside of the class, but because this is a templated class, each function would need to have the same template parameters at the class as well as a scope specifier to state that they are part of the class.  That is too much typing!  For templated classes, I prefer to make the definitions of the functions inside of the class.

First, let's focus on push.  It creates a node.  The node has a pointer to the next node, which will be set to the former _start).  Then, start is changed to point to the new node.
// push a value onto the stack
void push( const_reference value )
{
node_ptr newNode = _create_node( value );
newNode->next = start;
start = next;
}
_create_node creates a node and sets a value to it.  Let's not overcomplicate things yet.  Keep it simple.  Functions are your friends for simplifying code.  Also, let's not worry about the fact that node_type needs to have a next member or that _create_node is undefined as of yet.  We will get back to those issues in a bit.

Now, let us move on to the is_empty function.  It checks to see if _start points to anything.  If it doesn't, then it returns true.
// Check if the stack is empty
bool is_empty() const;
{
return ! _start; // shortcut
}
We encounter a problem now.  When the class is first created, what is _start set to?  It could be 0, it could be 0x7F (I ran across this on my Windows machine often), or it could be anything else!  We need to add a constructor.  Let's put it at the top of the interface functions.
// Default constructor
stack() :
_start( 0 )
{
}
While we are dealing with construction, let's focus on the destructor as well.  Since the class dynamically allocates its memory, we do need a destructor to free that memory.  Let's place it right below the constructor.
// Free up memory when object is destroyed
~stack()
{
while( _start ) // While not empty
_remove_item();
}
_remove_item removes the element from the top of the stack.  We will implement it in a bit.

Finally, let us create the pop function.  It takes the top element of the stack and stores a copy of it before erasing the top of the stack, then it returns the value's copy.
// pop a value off of the stack
value_type pop()
{
value_type copy = _start->value;
_remove_item();
return copy;
}
So, node_type needs a value part, too, huh?  Wait, we aren't there yet!  I like to keep an extra gedit/notepad open with a TODO of what needs done next.  Getting side-tracked every time you have an unimplemented feature is rather stressful.

Now we are done with the stack class' interface!  To figure out where to go next, let us look at what we have so far.
/*
 * Included files
 */

#include <cstdlib>
#include <iostream>
#include <memory>

using namespace std;



/*
 * Stack node
 */

// nothing here



/*
 * Stack class
 */

template<
typename value_type,
typename allocator_type = allocator<value_type>
>
class stack
{

/*
 * Internal typedefs
 */
typedef stack_node<value_type> node_type;
typedef node_type * node_ptr;



public:

/*
 * Public typedefs
 */
typedef const value_type & const_reference;



private:

/*
 * Data
 */
// Pointer to the first node of the singly-linked list
node_ptr _start;



public:

/*
 * Interface functions
 */
// Default constructor
stack() :
_start( 0 )
{
}

// Free up memory when object is destroyed
~stack()
{
while( _start ) // While not empty
_remove_item();
}

// push a value onto the stack
void push( const_reference value )
{
node_ptr newNode = _create_node( value );
newNode->next = _start;
_start = newNode;
}

// Check if the stack is empty
bool is_empty() const
{
return ! _start;
}

// pop a value off of the stack
value_type pop()
{
value_type copy = _start->value;
_remove_item();
return copy;
}

};



/*
 * Main function
 */

int main()
{
typedef int value_type;

stack<value_type> test;
for( int i = 0; i != 50; ++i )
test.push( rand() % 100 );
while( ! test.is_empty() )
cout << test.pop() << " ";
cout << endl;
}

Real quickly, let's do the node class.

From the above code, we can see that it needs a next member that points to the next node in the series, and it needs a value member that is the value being stored.  To store those two variables, it needs to know what type of value is being stored so that will be a template parameter for it.  The node class should look like this:
/*
 * Stack node
 */

template<typename value_type>
struct stack_node
{
stack_node<value_type> * next;
value_type value;
};

Now for using those allocators!

We have everything done except for the stack class' _remove_item and _create_node functions.  The _create_node function uses an allocator to allocate memory for a node_type and initialize it.  The _remove_item function uses an allocator to free up memory.  Both functions need an allocator for node_types.

The problem is that both functions need to work with memory for node_types, but the allocator we were given was for value_types.  We could do some messy hacks to make it work anyways (my first attempt stored a pointer to the data and hoped the user didn't care about the memory used by the node).  A better way is to redefine the allocator to work with another type.

To redefine an allocator, we need to know its template name (allocator), but we only know its type name (allocator_type, which is, unknown by the class, allocator<T>).  But before we throw this option out as impossible...  we could read the allocator specification or I could just tell you: every allocator type has a way for converting to another allocator type, and that way is by using its rebind template in conjuction with its neat little constructor.

An allocator is required to have "typename allocator_type::template rebind<new_value_type>::other".  Wow!  That looks pretty confusing, so let's break it down a bit.  It may make more since to see some potential source code for an allocator with just its rebind showing (hehe).
template<typename value_type>
struct allocator
{
template<typename new_value_type>
struct rebind
{
typedef allocator<new_value_type> other;
};
};
Basically, other is the same as allocator<new_value_type>.  In containers, we do not know the "allocator" part of that type (we only know the type name as a whole but not its parts), but we do know that the allocator type we have has a rebind structure that has a typedef that we can set to do the exact same thing as allocator<new_value_type> by just specifying new_value_type.

On a side note, one could use templated templates to solve the problem that bind solves, but those are not as flexible and are too easy to mess up.  But for the curious, that option is also out there.

Now we know that we need to use allocator::rebind::other to convert the allocator given to our class for value_types to an allocator that can handle node_types.  We tell allocator::rebind::other the new type we want it to manage the memory for, and we're golden!

Now that the way rebind works is understood, let us go back to that incredibly discombobulating "typename allocator_type::template rebind<new_value_type>::other".  The way that makes sense is to spell that phrase out as, "allocator_type::rebind<new_value_type>::other".  However, from the compiler's point of view, that shorter phrase is ambiguous.  It could mean, "allocator_type::rebind is less than new_value_type is greater than ::other".  Okay, compiler, rebind is a template, not some value!  "allocator_type::template rebind<new_value_type>::other" is better, but it is still ambiguous.  Is this phrase a type or a value?  The compiler isn't going to try to figure out something so darn complex, so we have to put typename before the whole thing.  "typename allocator_type::template rebind<new_value_type>::other" is our final type for the allocator_type<new_value_type>.

Let's put that discombobulating, finger-tiring, headache of a phrase into a typedef so we don't have to worry about it again!  This goes into the internal typedefs section.

typedef typename allocator_type
::template rebind<node_type>::other
node_allocator;
Now we have an allocator type for managing the memory that stores nodes!  Let us worry with the details of it no more!

There is one more thing to do before we move on from the allocators.  _remove_value and _create_node use an allocator object for their operations, not an allocator type.  We need to add an _alloc variable to our private data.
// The allocator used for memory management
node_allocator _alloc;
Since we have a new variable, we need to initialize it in the constructor.  While we are at it, let's give the user an option to specify an allocator object to use.  If none is specified, let's use a default allocator.

// Default constructor
// It has an option alloc parameter
stack(
const allocator_type & alloc = allocator_type()
) :
_start( 0 ),
_alloc( alloc )
{
}

We might have a problem, right?  Remember, allocator_type is a type for an allocator that works with value_types.  _alloc is an allocator that works with node_types.  They are two different types.  We are trying to convert an allocator<value_type> to an allocator<node_type>?  In fact, that is not a problem!  A while ago, I mentioned that converting allocator types is done via the rebind structure and by their neat little constructor.  Allocators have a copy constructor that can copy from an allocator that works with another type of data.  Therefore, we have no problem here.

Now we just use _alloc to work with our memory via its allocate (allocates memory but does not construct it), construct (constructs an object at a memory location), destroy (calls the deconstructor of an object), and deallocate (frees memory) functions.  We will do that in our _remove_item and _create_node functions.


Let us now code _remove_item and _create_node.

The _remove_item function simply remembers where _start->next points to, frees the memory pointed to by _start, then sets _start to what was the next node.
// Removes an item off of the stack
void _remove_item()
{
node_ptr next = _start->next;
_alloc.destroy( _start );
_alloc.deallocate( _start, 1 ); // 1 is how many objects need freed at _start
_start = next;
}
The _create_node function takes a value as its parameter.  Then, it allocates memory for a node, constructs the node with that value, then returns a pointer to that node.
// Creates a node that stores a value
node_ptr _create_node( const_reference value )
{
node_ptr newNode = _alloc.allocate( 1 ); // 1 is how many to allocate
_alloc.construct( newNode, value );// 1 is how many to allocate
return newNode;
}
Now we have two problems: the new node is constructed with a value, but we are constructing a node_type; and newNode->next is left undefined.

To solve the second problem, we could set newNode->next equal to 0 after it the node is created, or we could add a constructor to node_type that does tihs.

To solve the first problem, there are two possible solutions: have the node_type constructor take a value parameter, or use a separate allocator object for value_types and let that do the construction on the node's value variable.

If you are making an exact replica of the STL, you want to use a separate allocator for value_type since you need one anyways for the get_allocator member function (which we don't have here).

We will solve both problems by making a node constructor that takes a value parameter and initializes its own next pointer to 0 and its value to the passed value.
// Construct with value
stack_node( const value_type & src ) :
next( 0 ), 
value( src )
{
}

It is now time to test our program!
/*
 * Included files
 */

#include <cstdlib>
#include <iostream>
#include <memory>

using namespace std;



/*
 * Stack node
 */

template<typename value_type>
struct stack_node
{
stack_node<value_type> * next;
value_type value;

// Construct with value
stack_node( const value_type & src ) :
next( 0 ), 
value( src )
{
}
};



/*
 * Stack class
 */

template<
typename value_type,
typename allocator_type = allocator<value_type>
>
class stack
{

/*
 * Internal typedefs
 */
typedef stack_node<value_type> node_type;
typedef node_type * node_ptr;
typedef typename allocator_type
::template rebind<node_type>::other
node_allocator;



public:

/*
 * Public typedefs
 */
typedef const value_type & const_reference;



private:

/*
 * Data
 */
// Pointer to the first node of the singly-linked list
node_ptr _start;

// The allocator used for memory management
node_allocator _alloc;



/*
 * Helper functions
 */
// Removes an item off of the stack
void _remove_item()
{
node_ptr next = _start->next;
_alloc.destroy( _start );
_alloc.deallocate( _start, 1 ); // 1 is how many objects need freed at _start
_start = next;
}

// Creates a node that stores a value
node_ptr _create_node( const_reference value )
{
node_ptr newNode = _alloc.allocate( 1 ); // 1 is how many to allocate
_alloc.construct( newNode, value );
return newNode;
}



public:

/*
 * Interface functions
 */
// Default constructor
// It has an option alloc parameter
stack(
const allocator_type & alloc = allocator_type()
) :
_start( 0 ), 
_alloc( alloc )
{
}

// Free up memory when object is destroyed
~stack()
{
while( _start ) // While not empty
_remove_item();
}

// push a value onto the stack
void push( const_reference value )
{
node_ptr newNode = _create_node( value );
newNode->next = _start;
_start = newNode;
}

// Check if the stack is empty
bool is_empty() const
{
return ! _start;
}

// pop a value off of the stack
value_type pop()
{
value_type copy = _start->value;
_remove_item();
return copy;
}

};



/*
 * Main function
 */

int main()
{
typedef int value_type;

stack<value_type> test;
forint i = 0; i != 50; ++i )
test.push( rand() % 100 );
while( ! test.is_empty() )
cout << test.pop() << " ";
cout << endl;
}

It compiles fine, it prints out a lot of random numbers, and when looking under the hood with the GNU debugger, everything looks like it is working perfectly!  Now we know how to create container classes that utilizes allocators.


The above stack is not very well designed.

The purpose was to teach how to use allocators in a container.  I did not worry that the above stack might be actually used in any serious implementation.  The following lists a few of the many design problems:

  • As it is, an allocator can be implicitly converted to a stack.  To solve this problem, the constructor needs to have an explicit keyword before it.
  • The pop function is inefficient.  If you do not care about the return value, then an extra copy operation is still performed.
  • The code should be organized with the stack in a header file, and that header file should not have a "using namespace std;" in it.
  • The surrounding paragraphs document the code, but a good stack class has more descriptive comments for its interface functions.
  • Exceptions are not caught.  This could cause memory leaks if the stack is used in another project.
  • Scoped allocators may not work as expected.  The method of binding the allocator to a node_type is the method that g++ 4.7 seems to use (I examined the source code), and, because of that, additional dummy allocators may need to be specified for a scoped allocator.  What is the solution?  That is open for discussion.  Please comment if you are familiar with scoped allocators.


More information about allocators:

There are a few useful classes for use with allocators: allocator, scoped_allocator_adapter, allocator_traits, and maybe a few custom ones shipped with your compiler.

About allocator:

  • std::allocator, the default allocator class requires "#include <memory>".
  • An allocator's allocate function can be used to allocate more than one piece of data at a time via its second parameter.  This is handy for vectors and strings.  The deallocate function's second parameter should match this.
  • An allocator's allocate function allocates memory for data, but the data's constructor(s) is/are not called. That is the purpose of the construct function.
  • The destroy function of an allocate should be called before calling deallocate.  You can, however, use one allocator<value> to destroy the "value" field of a node and then use another allocator<node>'s deallocate for the rest of the node if the node does not have a destructor.
  • Some classes have an overloaded address-of operator.  You should use an allocator's address function to get the true address of an object.
  • You can use an allocator's max_size function to get the maximum number of pieces of data that can be allocated at a time.

About scoped_allocator_adapter:

  • Suppose you have a vector of strings.  A scoped allocator allows you to use a different allocator type for them.
  • scoped_allocator_adapter is a C++11 feature.
  • I am unsure of how well-implemented scoped_allocator_adapter is.  Comments, please?


About allocator_traits:

  • allocator_traits provides default values for allocators, just in case they don't implement all of their typedefs and member functions.
  • It is used like so: allocator_traits<allocator_type>::member_you_want_to_access
  • In the version of the standard specifications that I have1, allocator_traits does not have reference types (nor is it a requirement for allocators even though unordered_* assume that it is).
  • I do not think that this is used often yet, and it is hard to understand, so I did not cover it in much detail.
  • allocator_traits is a C++11 feature.



References:

1. Working Draft, Standard for Programming Language C++, document #N3242, dated 2011-02-08

No comments:

Post a Comment