Image of Navigational Map linked to Home / Contents / Search Sorting Numbered Documents

by Rupert Walsh - GUI Computing
Image of Line Break

Normally, I use a bubble sort to order data. It's reasonably simple, and unless you're sorting a large amount of data, it's fast enough for most applications. The crux of sorting data is comparing two items in an out-of-order list, and deciding which comes first. For sorting numbers, you can simply use the >, < or = operators. For sorting text, Visual Basic has the built in function StrComp, short for String Compare. Its syntax is as follows:

StrComp(string1, string2[, compare_type])

It can return any of four values:

string1 is less than string2	-1
string1 is equal to string2	 0
string1 is greater than string2	 1
string1 or string2 is Null	Null

The compare_type can be 0 (vbBinaryCompare), 1 (vbTextCompare) or 2 (vbDatabaseCompare for MS Access databases only). The only difference between Binary and Text comparison is that Binary comparison is case sensitive, whereas text isn't.

If your data is numeric, or simple text strings, the built in functions and a simple sorting algorithm are all you need. However, imagine that you need to sort the titles from a book on programming. It may be divided into sections like this:

1 Introduction
2 Programming Basics
2.1 Input
2.2 Output

2.9 Vectors
2.10 Sorting Algorithms
3 Advanced Programming
3.1 Graphics
4. Glossary

This sort of text is common in manuals, textbooks and other documents. And VB's inbuilt functions just don't cover it - time for a simple addition to my programmer's toolkit.

My routine has three sort type options: Binary, Text and what I've called Index for want of a better term. If we sort the example above, here are the results using either the binary or text comparisons:

It's mostly OK, except that the author probably expects section 2.10 to come after section 2.9, rather than between 2.1 and 2.2. To sort this type of data correctly, I've written a function called StrCompEx (short for Extended String Compare) that has a third comparison method, which uses a simple bubble sort. It is defined as follows:

Public Function StrCompEx(ByVal sFirst As String, _
     ByVal sSecond As String, _
    Optional ByVal nCompareType As eCompareType = ctBinary) As eCompareResult
	

Note that I have enumerated both the compare and the return types, so that they pop up for auto-completion when you use the function in VB5 or VB6 (as described in detail in another article).

When the user clicks on the Sort button, each item in the list is added to an array which is passed (by reference) to the function SortStrings, along with the sort type:

Public Sub SortStrings(ByRef aText As Variant, ByVal nOrder As eCompareType)
'Bubble Sort the strings in aText

Dim nGap As Integer
Dim i As Integer
Dim j As Integer
Dim nCount  As Long
Dim nRowIndex As Long
Dim tmpText As String

   
   nCount = UBound(aText)
   nGap = nCount / 2
   
   If nGap = 0 Then nGap = 1
   
   Do While nGap > 0
      For i = nGap To nCount
         j = i - nGap
         Do While (j >= 0)
           If StrCompEx(aText(j), aText(j + nGap), nOrder) <= 0 Then Exit Do
            
            tmpText = aText(j)
            
            aText(j) = aText(j + nGap)
            
            aText(j + nGap) = tmpText
            j = j - nGap
         Loop
      Next
      nGap = nGap / 2
   Loop
    
End Sub

If you don't understand how a bubble sort works, I suggest you read about it in more detail in almost any programming or algorithm textbook. The essence of the bubble sort is that it compares pairs of values, swapping them if they are out of order, until all values are in order.

This is the line that performs the comparison:

If StrCompEx(aText(j), aText(j + nGap), nOrder) <= 0 Then Exit Do

It is here that I call my StrCompEx function, and if it returns a value greater than zero the string positions will be swapped.

As you can see, StrCompEx itself is really just a wrapper function for the standard StrComp, and another function called CompareByIndex, which is only called for our special 'Index' sort type:

Public Function StrCompEx(ByVal sFirst As String, ByVal sSecond As String, Optional ByVal nCompareType As eCompareType = ctBinary) As eCompareResult
'Extended string compare
'Purpose:   Adds a new compare type for sorting numbered documents

    'Catch null strings first so that null won't be returned by strcomp
    If IsNull(sFirst) Or IsNull(sSecond) Then
        StrCompEx = crError
    Else
        Select Case nCompareType
        Case ctBinary
            StrCompEx = StrComp(sFirst, sSecond, vbBinaryCompare)
        Case ctString
            StrCompEx = StrComp(sFirst, sSecond, vbTextCompare)
        Case ctIndexNumeric
            StrCompEx = CompareByIndex(sFirst, sSecond)
        End Select
    End If
    
End Function

CompareByIndex uses a support function called GetIndexArray to split each of the strings we want to compare into an array containing the index part, and a string containing the rest of the text. For example, section "2.10 Sorting Algorithms" is split into an array containing the (string) values "2" and "10", and a string containing " Sorting Algrorithms":

Private Function GetIndexArray(ByVal sString As String, ByRef aIndexString As Variant, ByRef sTextPart As String)
'Split the string into an array containing each level of the index
'and a string containing the rest of the title.

Dim nPos As Long
Dim sChar As String
Dim sReturn As String
Dim nIndex As Integer

    For nPos = 1 To Len(sString)
        
        sChar = Mid(sString, nPos, 1)
        
        Select Case sChar
        Case "0", "1", "2", "3", "4", "5", "6", "7", "8", "9"
            'If numeric, concatenate
            If nPos = 1 Then
                'We need to redim the first time we add a level.
                ReDim aIndexString(0 To 0)
            End If
            aIndexString(UBound(aIndexString)) = aIndexString(UBound(aIndexString)) & sChar
        Case "."
            If nPos = 1 Then
                'We need to redim the first time we add a level.
                'Will only occur here if the first character is a period
                '(unlikely but possible)
                ReDim aIndexString(0 To 0)
            End If
            'Index level seprator - add a level to the array
            ReDim Preserve aIndexString(0 To UBound(aIndexString) + 1)
        Case Else
            'Any other character means we are no longer within the index number
            'so the rest of the string is the text part.
            sTextPart = Mid(sString, nPos + 1)
            Exit For
        End Select
        
    Next
    
End Function

Once the strings have been split into their index array and text parts, we can proceed to compare them. We need to check whether each string started with an Index number at all. If it didn't, the variant that is was passed to GetArrayIndex won't have been redimmed, and won't be an array at all. We test this with IsArray(). If both strings had an index number, we compare each value in the array, starting with what was the leftmost part of the index.

As an example the "2" part of "2.10" is compared with the "2" of "2.3". We continue testing each of the values in the arrays, so the next the "10" is compared with the "3". These values are stored as strings in our array, but we want to compare them as (long) integers, so they are converted with CLng and tested using the >, < and = operators.

There are quite a few more checks we need to do to handle index numbers that are of different depths (say "2.1" vs "2.1.1") or with unusual syntax (say "2.1." or ".3"), as explained with comments in the code.

If the index numbers turn out to be equal (or both missing) we need to base our comparison on the rest of the strings (stored in sFirstText and sSecondText), and can use the standard StrComp() function.

Here's the function in full:

Private Function CompareByIndex(ByVal sFirst As String, sSecond As String) As eCompareResult
'Compare two strings using document numbering method

Dim sFirstText As String
Dim sSecondText As String
Dim aFirst As Variant
Dim aSecond As Variant
Dim nPos As Long
Dim nMaxBound As Integer
Dim nIndex As Integer
Dim nReturn As eCompareResult

    'Split the strings into index and text parts
    GetIndexArray sFirst, aFirst, sFirstText
    GetIndexArray sSecond, aSecond, sSecondText
    
    'There are four possible results after splitting:
    '1. Both had an index
    '2. 1st only had an index
    '3. 2nd only had an index
    '4. Neither had an index
    
    If IsArray(aFirst) And IsArray(aSecond) Then
        'Both had an index - compare on index
        
        'Only compare up to the lowest depth:
        If UBound(aFirst) < UBound(aSecond) Then
            nMaxBound = UBound(aFirst)
        Else
            nMaxBound = UBound(aSecond)
        End If
        
        'If first or second is greater at any level, we have a winner
        For nIndex = 0 To nMaxBound
            If CLng(aFirst(nIndex)) = CLng(aSecond(nIndex)) Then
                'If they were equal when converted to longs, need to
                'test for case when one was empty and other was zero
                'The empty one wins.
                If IsEmpty(aFirst(nIndex)) Then
                    nReturn = crFirst
                    Exit For
                ElseIf IsEmpty(aSecond(nIndex)) Then
                    nReturn = crSecond
                    Exit For
                End If
            ElseIf CLng(aFirst(nIndex)) < CLng(aSecond(nIndex)) Then
                nReturn = crFirst
                Exit For
                
            ElseIf CLng(aFirst(nIndex)) > CLng(aSecond(nIndex)) Then
                nReturn = crSecond
                Exit For
                
            End If
        Next
        
        'If at this stage we still have no winner, check whether one
        'is indexed to a deeper level. It will lose.
        If nReturn = crEqual Then
            If UBound(aFirst) > UBound(aSecond) Then
                nReturn = crSecond
            ElseIf UBound(aFirst) < UBound(aSecond) Then
                nReturn = crFirst
            Else
                'If they are still the same, compare the rest of the strings
                nReturn = StrComp(sFirstText, sSecondText, vbTextCompare)
            End If
        End If
        
    ElseIf IsArray(aFirst) And Not IsArray(aSecond) Then
        '1st only had an index - return first
        nReturn = crFirst
    ElseIf Not IsArray(aFirst) And IsArray(aSecond) Then
        '2nd only had an index - return second
        nReturn = crSecond
    ElseIf Not IsArray(aFirst) And Not IsArray(aSecond) Then
        'Neither had an index - do a traditional text compare
        nReturn = StrComp(sFirst, sSecond, vbTextCompare)
    End If
    
    CompareByIndex = nReturn
    
End Function

Here's the result of our new sort, just as the author intended:

Before I'm bombarded with email, let me say that I don't think this sort will be particularly quick. Redimming the index arrays at each level is a definite no-no, as is checking the upper bound more times than is absolutely necessary, as I have done in the GetIndexArray function. However, for the type of data that would probably use this type of sort, it probably doesn't need to be all that fast. I almost always code for readability and maintainability, in any case. Speed freaks, feel free to tune this code!

The example program is available in full for download.



Written by: Rupert Walsh
October '98

Image of Arrow linked to Previous Article Image of Arrow linked to Next Article
Image of Line Break
[HOME] [TABLE OF CONTENTS] [SEARCH]