Wednesday, February 25, 2015

Merge Sort for Delphi

Today I was investigating a problem in TList sort order after calling the built-in Sort() method . A few objects were added to the list and then the Sort() method is called. In this specific case, happens that all objects were equally ranked, and once TList uses a Quick Sort algorithm, the order of elements in the list change, even though there is no need to sort them (they are equally ranked, so the comparison function/method always returns 0).

This is because Quick Sort algorithm is unstable, i.e. it does NOT keep the relative position of equally ranked elements within the list, after execution. Read more about sorting algorithms and stability here.
Lets see an example: We have a list containing only 3 objects. This object class (TMyObject) has a property named Key, used to sort the objects within the list. It also has another property named Index, where the original index within the list is kept:

List[0]: Key = 1, Index = 0;
List[1]: Key = 1, Index = 1;
List[2]: Key = 3, Index = 2;

The comparison function used is this:

function CompareItems(Data1, Data2: Pointer): Integer;
  Obj1, Obj2: TMyObject;
  Obj1 := TMyObject(Data1);
  Obj2 := TMyObject(Data2);
  if Obj1.Key < Obj2.Key then
    Result := -1
  if Obj1.Key > Obj2.Key then
    Result := 1
    Result := 0;

Using the built-in Quick Sort algorithm, your list may end like this:

List[0]: Key = 1, Index = 1;
List[1]: Key = 1, Index = 0;
List[2]: Key = 3, Index = 2;

As you can see, the Quick Sort algorithm may exchange elements 0 and 1 when sorting the list, because they both have Key = 1 (so the CompareItems() function will return 0 when comparing these elements).

This behavior of Quick Sort may or may not have some impact on your application. Well... In my specific problem, it HAS a severe impact. This is because the natural order of the list represents the creation order of the objects and I need this order to be preserved, if possible (if elements are equally ranked).

The best alternative seemed to be the Merge Sort algorithm, but I started looking for any fast and stable sorting algorithm for Delphi. Unfortunately, I found only a few and most of them were poorly implemented or required specialized data structures, like linked lists.

After digging a little more, I found an old Merge Sort implementation published in the - excellent - The Tomes of Delphi Algorithms and Data Structures book, by Julian Bucknall.

The implementation is a little old and needed some adjustments to make it run with newer versions of Delphi. In fact, I ended up with an implementation compatible with Delphi 2009 up to Delphi XE7. It probably will also run with older versions of Delphi, and newer versions as well.

Some interesting facts about all this sorting stuff:

  • Quick Sort has a worst-case running time of O(n2). On the other hand, Merge Sort does not have a worst-case scenario, and has a running time of n log n, every time.
  • Merge Sort requires an auxiliary list to do the sorting, when working with TList-like object classes. This imposes a higher memory requirement, but this is probably irrelevant nowadays.
  • The Merge Sort implementation is AS FAST AS Delphi's built-in implementation in most cases. In some cases, sorting HUGE lists with low dispersion, Merge Sort may be FASTER than Quick Sort.
  • Using this implementation to sort 1000000 items in my list I got:
    • Delphi's built-in Quick Sort:  0,38 seconds
    • Merge Sort:  0,34 seconds, i.e. more than 10% faster!
  • As already said, Merge Sort is STABLE. Under some circumstances this is important.

In the end, with all advantages of Merge Sort, I also created a TList and a TObjectList descendant redeclaring the Sort() method, using this same Merge Sort implementaion, instead of the default Quick Sort.

Download the merge sort Delphi units and test application from Bitbucket repository here.


dangph said...

Unfortunately it doesn't appear to be completely stable. I wrote some DUnit tests, and one of them fails (TTestCaseMergeSort.Test10000).

dangph said...

If you remove the calls to the Insertion Sort, then it becomes stable. But I'm not sure I trust it now.

Alexandre Caldas Machado said...

Hi dangph,

thanks for your feedback. I'll take a look at your test case and I'll probably post the results here, OK?

dangph said...

Sure, please do.

Alexandre Caldas Machado said...

Hi dangph,

the code is fixed. Please check my new post about this subject. If you can give me your name, I can include it in the unit credits :-)

Thank you very much!

dangph said...

Great! I notice you simplified the insertion sort. Was the original one from the book?

Thanks, but I'd prefer not to use my real name because I don't link this account with my real name.

Alexandre Caldas Machado said...

Yes, the original Insertion Sort comes from the book. It is a modified version meant to be faster. But it clearly breaks the stability of the merge sort algorithm.