Showing posts with label source. Show all posts
Showing posts with label source. Show all posts

Wednesday, December 29, 2010

C++ Min-Max Heap Implementation

Update: See the GitHub Repo for the latest version of the code!

I wasted a decent amount of time yesterday...

I've recently come across the need for a double ended priority queue (where you can pop both the min and the max efficiently) and I have been fervently searching for the best way to go about implementing one. My initial searches brought me to a paper written in October of 1986, which described a novel data structure called a Min-Max Heap.

Given a set S of values, a min-max heap on S is a binary tree T with the following properties:
  1. T has the heap-shape
  2. T is min-max ordered: values stored at nodes on even (odd) levels are smaller (greater) than or equal to the values stored at their descendants (if any) where the root is at level zero. Thus, the smallest value of S is stored at the root of T, whereas the largest value is stored at one of the root’s children; an example of a min-max heap is shown in Figure 1 (p. 998).

Below is a diagram I created to illustrate a Min-Max Heap. You may notice that the ordering has been reversed, namely the even levels are max levels and the odd levels are min levels.

Figuring that a Min-Max Heap was perfect for my needs, I went ahead and implemented the data structure. However, after I finished, another data structure was brought to my attention: an Interval Heap. I have run across Interval Heaps before in my travels but only now do I recall that they too have the properties I want. I have also found a number of articles telling me that Interval Heaps are the best and Min-Max Heaps are the worst (as far as real time performance goes).

So now I need to create an implementation of an Interval Heap, not to mention I have this grand Min-Max Heap implementation that will surely never find use. I figure I should put it up on the Internet in the hopes that someone will find use for it though...

What exactly does a Min-Max Heap offer a programmer? It allows constant time lookup of its min and max values; logarithmic time insertion; logarithmic time deletion of min and max values; and linear time creation. Really quite a powerful structure. It is known to be about twice as slow as an interval heap in practice though (one article postulated that this was due to the higher likelihood of cache misses).

The reason I needed a structure with these properties was that I wanted to create a Stunted Priority Queue (a term I just now coined, not in any way an accepted term), or a Priority Queue that would allow me to set a maximum size, and it would never exceed that size. If an insertion made the queue exceed that size, the smallest value would be popped off to compensate.

I have created a very solid implementation of a Min-Max Heap, however, I have not included support for linear time creation. The only way to build the heap is to use the push function, which I believe gives you O(n ln(n)) time. If you would like support for that you will have to add it yourself, though if you do, please send me the modified code so I can update this site.

As always, please email me or post here if you find any problems with the code, or if any part of it is confusing so I may fix it.

I'd also like to hear comments on some of my implementation choices. Such as using a zero-indexed heap rather than a one-indexed heap (which would require an extra sizeof(T) memory). Or using template parameters rather than function parameters in my trickleUp() and trickleDown() functions.

Monday, November 8, 2010

An Easy to Use Profiling Class

My professors always want profiling data...

It is really quite a bother to profile my code accurately. The explosion of timers and temporary variable always makes my code degenerate into an ugly mess.

The other day I decided to bite the bullet and make my own profiling library, one that I would truly enjoy using. I'm proud to say I finished yesterday, and I've put the finishing touches on it today.

My library adds very little code to your project and provides very accurate results. Rather than tell you all about my library, let's look at some quick examples to demonstrate how simple it is.

My Foo Function
void foo() { for (int i = 0; i < 99999; ++i) ; int q = 1; q *= (int)((4.3 * 23.43 * 23.454) / 234.32); }
My Foo Function With Profiling
void foo() { PF(); // Alias for PROFILE_FUNCTION() for (int i = 0; i < 99999; ++i) ; int q = 1; q *= (int)((4.3 * 23.43 * 23.454) / 234.32); }

As you can see, adding profiling into your code is a simple matter. Only one line of code required. If you were to execute that function, your program would automatically output the results of the profiling to the text file profile.txt when your program terminates (this is accomplished by a callback function added to the list of functions maintained by atexit()). And if you'd like to stop any profiling data from being collected, you may simply define NO_PROFILE at the top of your main.cpp file.

That's not the only thing you can do though, perhaps you'd like to profile a block of code, and not necessarily an entire function.

My Foo Function
void foo() { for (int i = 0; i < 99999; ++i) ; int q = 1; q *= (int)((4.3 * 23.43 * 23.454) / 234.32); }
My Foo Function With a Little Bit of Profiling
void foo() { for (int i = 0; i < 99999; ++i) ; PROFILE_CODE() { int q = 1; q *= (int)((4.3 * 23.43 * 23.454) / 234.32); } }

It's all very simple. There is even support for partitioning, however, I find it difficult to explain exactly what that means so I suggest taking a look at the main.cpp.txt file that is included with the header files to see a comprehensive overview of the features.

Below is some example profiling output. This is the output of the main.cpp.txt example file that is included with the header files.

~Profiling data output Mon Nov  8 15:01:45 2010

Profile for "Strange Char Operations"
 Ran in 0.000000 seconds. Tested 19900 times.
Profile for "The doMoreStuff() function"
 Ran in 0.006900 seconds. Tested 100 times.
Profile for "function doStuff()"
 Ran in 0.000063 seconds. Tested 10100 times.
Profile for "function linear()"
 Partition 0   ran in 0.000000 seconds. Tested 10  time(s).
 Partition 1   ran in 0.006000 seconds. Tested 10  time(s).
 Partition 2   ran in 0.005000 seconds. Tested 10  time(s).
 Partition 3   ran in 0.009000 seconds. Tested 10  time(s).
 Partition 4   ran in 0.013000 seconds. Tested 10  time(s).
 Partition 5   ran in 0.013000 seconds. Tested 10  time(s).
 Partition 6   ran in 0.017000 seconds. Tested 10  time(s).
 Partition 7   ran in 0.018000 seconds. Tested 10  time(s).
 Partition 8   ran in 0.022000 seconds. Tested 10  time(s).
 Partition 9   ran in 0.025000 seconds. Tested 10  time(s).
 Average time for each partition is 0.012800 seconds.
 Total time for all partitions is 0.128000 seconds.
Profile for "main.cpp @ 35"
 Ran in 0.000064 seconds. Tested 10000 times.

You can find the code here (updated 11/9/2010), simply extract the folder into your source tree and include profile/profile.h in your main.cpp file.

I will be constantly updating the library as I find problems and/or add new features. I encourage you to check back whenever you'd like to use it to see if there is an update out. Also, if you'd like, you may leave me your email and I will alert you to new updates.

As always, please email me or post here if you find any bugs in the code so I may fix them.