by Jim Karabatsos - GUI Computing
One of the cool new features introduced in VB4 (and remaining in VB5) is the Collection object. I don't propose to explain what collections are or how they are used in this article (in fact, I'm sure I've done an article like that some time in the dim and distant past). Instead, I will present an alternative to a Collection that makes different choices between performance and features.
The internal implementation of the Collection class exposed by VB has not been publicly released. It doesn't take a lot of exploration, however, to realise that the Collection objects are implemented internally as some sort of doubly-linked list with some additional dictionary mechanism (probably a hash table) to support string keys.
Collections are actually a very useful facility because they can dynamically size themselves and because the elements in the collection can be referenced by a string key value. When many people first saw it in VB4, they started replacing all their arrays with collections.
Then they went back and changed all their collections back to arrays when they found that the performance of their programs suffered terribly.
As in all areas of life, or at least of programming (which for some of us is much the same thing), you need to make trade-offs between performance and function. Collections provide more facilities than arrays. Arrays are (generally) more efficient than collections. TANSTAAFL.
Let me point out some of the things I do not like about collections. I do like the ability to use string keys, but I really dislike the fact that you cannot retrieve the key for an item given its numeric index (hence the clue to the likely use of a hash table for the dictionary mechanism). I hate not being able to change an item other than by deleting it and adding a new one. I like the For Each facility but I don't want to pay a price for it in terms of performance. Oh, and for many things I do, the way things keep moving around is just plain trouble.
These complaints are not really about the collection class at all - they are really a by-product of trying to use a collection for a job it was never designed to do.
So I have decided to write a set of additional classes that provide specialised data handling capabilities, each optimised for one type of process. We start off in this issue with a Vector class, which is essentially a dynamically re-sizable array. Before we get into actually writing the class, let's pretend we have one and start writing some code to use it. I find that this is a very good way to work through the design of a class.
Here is some sample code :
Dim Vector As CVector Set Vector = New CVector For I = 1 To 20 Vector(I) = "This is string " & Format$(I) Next I ' Will print the 20 items For I = 1 to Vector.Last Print Vector(I) Next I ' Truncate the vector Vector.Last = 5 ' Change the contents of an item Vector(3) = "This is a new string" ' Will print the 5 items remaining For I = 1 to Vector.Last Print Vector(I) Next I
The CVector class will allow you store variants into it with a (long) integer subscript. You do not need to pre-allocate the storage for it because it will dynamically resize its internal array to cater for whatever you throw at it. It is always 1-based (I considered making it user-configurable but decided that it was not worth the effort).
The main property that you will interact with is the Item property, which is the default. That means that these two lines are identical :
Vector.Item(3) = "Hello" Vector(3) = "Hello"
The Last property returns the last used element in the vector. All elements between 1 and Last exist and can be referenced. As they are variants, any elements that you do not assign a value return Empty, as in :
Dim V as New CVector V(3) = 2 V(2) = IsEmpty(V(1))
In this code, V(2) will be assigned a Boolean True.
You can explicitly set the size of the vector by assigning a (long) integer value to the Last property. If the value you assign is less than the current value, the vector is truncated. If it is larger, the vector is grown (with empty elements). This can have significant performance benefits if you are going to create a large vector and you know ahead of time how large it is going to be.
Speaking of efficiency, the class will dynamically re-size its internal array in chunks. Whenever you try to set a value into an element beyond the last allocated one, the class will re-size the array to that element plus the current chunk size.
By default (and at creation) the chunk size is 10. You can access and change that value using the Chunk property:
Dim V as New CVector V.Chunk = 100 V(30) = "Something" Print V.Last V.Last = 35 Print V.Last
Line 2 sets the chunk size to 100 (from the default of 10). Line 4 sets a value into element 30 (which did not previously exist). This will cause the internal storage to be re-sized to cater for 130 items (30+100).
Note that line 4 would print 30, not 130. The Last property returns the last element that has been assigned a value, not the last one allocated.
Line 5 truncates the internal storage to 35 items (freeing the internal storage). It also sets the Last property as if that item had been set a value, so line 6 prints 35 and IsEmpty(V.Last) is true.
With all that out of the way, let us turn our attention to the implementation of the CVector class. It is actually pretty simple. Here is the complete source code:
Option Explicit Private av() As Variant Private nLast As Long Private nChunk As Long Public Property Get Chunk() As Long Chunk = nChunk End Property Public Property Let Chunk(NewValue As Long) If NewValue = nChunk Then Exit Property If NewValue < 1 Then Err.Raise vbObjectError Or 1002, "CVector.Chunk", "Chunk is less than one" Exit Property End If nChunk = NewValue End Property Public Property Get Last() As Long Last = nLast End Property Public Property Let Last(ByVal NewLast As Long) If NewLast = nLast Then Exit Property If NewLast < 1 Then Exit Property ReDim Preserve av(1 To NewLast) As Variant nLast = NewLast End Property Public Property Let Item(ByVal Index As Long, ByVal V As Variant) If Index < 1 Then Err.Raise vbObjectError Or 1000,"CVector.Let", "Index is less than one" Exit Property End If On Error GoTo Error_Handler av(Index) = V If Index > nLast Then nLast = Index End If Exit Property Error_Handler: If Index > UBound(av) Then ReDim Preserve av(1 To Index + nChunk) As Variant Resume End If Err.Raise Err.Number, Err.Source, Err.Description, Err.HelpFile, Err.HelpContext End Property Public Property Set Item(ByVal Index As Long, ByVal V As Variant) If Index < 1 Then Err.Raise vbObjectError Or 1000, "CVector.Let", "Index is less than one" Exit Property End If On Error GoTo Error_Handler Set av(Index) = V If Index > nLast Then nLast = Index End If Exit Property Error_Handler: If Index > UBound(av) Then ReDim Preserve av(1 To Index + nChunk) As Variant Resume End If Err.Raise Err.Number, Err.Source, _ Err.Description, Err.HelpFile, Err.HelpContext End Property Public Property Get Item(ByVal Index As Long) As Variant On Error Resume Next If Index < 1 Then Err.Raise vbObjectError Or 1000, _ "CVector.Get", "Index is less than one" Exit Property End If If Index > nLast Then Err.Raise vbObjectError Or 1001, _ "CVector.Get", "Index is beyond end of vector" Exit Property End If If IsObject(av(Index)) Then Set Item = av(Index) Else Item = av(Index) End If End Property Private Sub Class_Initialize() nChunk = 10 nLast = 1 ReDim av(1 To nChunk) As Variant End Sub
All of this is pretty straight-forward stuff, so I'm not going to go through it in any detail whatsoever. I will, however, point out a couple of interesting things. First, note the use of the following code sequence in the Property Get procedure for Item:
If IsObject(av(Index)) Then Set Item = av(Index) Else Item = av(Index) End If
Because the items are variants, users of this class might well want to assign objects to the items. We want to make sure that we correctly return an object reference to the user in the Get procedure (and not the default object property which is what would happen if we neglected to use Set as shown).
Finally, speaking of default properties, let me, errr, discuss the, errr, innovative way the MS chose to implement this.
Any sensible designer would simply have added a "Default" keyword to the property declaration, as in:
Default Public Property Get …
So of course MS did not do it this way. No. Instead you need to go into the DBFH (Dialog Box From Hell) that is brought up from the Tools/Procedure Attributes menu item. Intuitive, that. Luckily, you can just use the hotkey - no, wait a minute, there isn't one. Anyway, do you see anywhere you should set the "Default" attribute?
Neither do I, because it isn't there. You need to click on the Advanced button:
OK, so where is it. Hmmm. Could it be the "User Interface Default" checkbox? Well, it could be, but it isn't. Instead, see the "Procedure ID" combo? If you drop down the list, you will see an entry "(default)" - I don't know what the parentheses mean either. That's the one you need to set. Everything else on this dialog is totally unrelated to this (it actually refers to control creation).
And all this just to set a particular property to be the default! All this actually does is to set the following line in the source file immediately after the Property Get Item statement:
Attribute Item.VB_UserMemId = 0
Gimme a break. VB5 is a version 2 product. I think it has not lived up to its promise. Hopefully VB6 (which will really be that magical version 3!) will not disappoint.
How does it perform? Well, Here are some informal tests run on a Pentium 100 with 32M of ram. All times are in seconds.
|Test (all items string variants)||Collection with Index||Collection with For/Each||Vector Chunk = 10||Vector Chunk = 1000|
|Insert 1,000 items||0.1094||0|
|Retrieve 1,000 items||0.2813||0.0586|
|Insert 10,000 items||0.1094||0.1133||0.2188||0.1680|
|Retrieve 10,000 items||13.8398||0.0586||0.2227||0.2227|
|Insert 50,000 items||0.9414||0.7109||1.2070||0.8789|
|Retrieve 50,000 items||452.3403||0.1680||1.1016||1.0391|
So, what do the figures mean? They mean that collections are really good with For/Each but they suck for (numeric) indexed access. Other tests that I have performed show that collections are really good if you meet the following criteria:
In contrast, the CVector class is really very good at dealing with a wide range of sizes and has no problem with dealing with For/Next and with access to and modification of items in the "middle" of the group.
What about implementing For/Each for the CVector class? Well, it turns out that the only way to do that easily is to embed a collection in your class and delegate to its enumerator object, a process which also involves the Dialog Box From Hell and the magic number negative 4. We don't want to go there today because we don't want to embed a collection in our class. If anyone really wants to know how to create an enumerator for this sort of class that can be used by VB's For/Each statement, email me and we'll talk about it. It is possible, and using nothing more than VB and a custom type library that you don't even need to distribute with your apps, but it is such a nasty hack involving patching VTable entries that even I hesitate before revealing it, and that should set alarm bells ringing all over the place.
Until next time,