Streaming objects in VB5
by Jim Karabatsos - GUI Comptuting
We are all trying to come to grips with the implications of the new object capabilities of VB5. One of the most exciting capabilities is actually not part of VB5 at all. I am talking about Distributed COM, or DCOM for short, which is actually a part of the Win32 operating systems. It was introduced in Windows NT 4.0 and is available as a released software component for Windows 95.
DCOM essentially takes the L out of LRPC (Lightweight Remote Procedure Call) that is the mechanism used by OLE to support out-of-process (a.k.a. "EXE") servers. In LRPC, OLE needs to marshal input and output parameters and return values across process boundaries (or virtual machines). DCOM takes that little extra step by taking the marshalled parameters and moving them across machine boundaries. From the viewpoint of both the client and the server, the cross-machine marshalling is invisible and there is no difference between developing an out-of-process server for local deployment or for remote deployment.
That's the theory, anyway.
In fact there is a significant difference between these two scenarios, about three orders of magnitude. You see, marshalling data across a VM boundary, while an expensive process, can still be done at memory access speeds. On the current generation of hardware, this is basically instantaneous. Sure, you can measure the difference between the access speed of out-of-process and in-process servers, but you need to set up a test that iterates through many thousands of calls to do so. On the other hand, access to a remote server (especially on a busy network, and aren't they all?) is much slower than either form of local server.
The problem is exacerbated by the design often seen in ActiveX servers, where it is common to expose the object attributes as a rather large set of individual properties, often augmented by a few methods that make use of those attributes. Let's take as an example a hypothetical Customer object. Here is some code that you would typically see in any "new age" VB program:
Dim C As CCustomer Set C = New CCustomer C.Surname = "Smith" C.FirstName = "John" C.Address = "1 Smith St" C.City = "Sometown" C.State = "Vic" C.Telephone = "(03) 9876 5432" C.Persist
Of course, there would typically be many more individual properties, and perhaps the Persist method might in fact be a function that returns a success indicator. However, even with just this small set, consider what needs to happen if the CCustomer object is provided by a remote server. First, the Set statement needs to make a network round trip to create the object. Then each of the assignment statements needs to make another round trip. A final round trip is required to invoke the Persist method. So even this simplified fragment of code requires eight round trips. Ouch, but it gets much worse. Consider this fragment:
Dim objIS As CInvoiceStore
Dim objInvoice As CInvoice
Set objIS = New CInvoiceStore
objIS.Filter = "CUSTID = " & nCustID
objIS.Refresh
If objIS.Count > 0 Then
For Index = 1 to objIS.Count
Set objInvoice = objIS.Items(Index)
Print #ReportFile, objInvoice.InvoiceNumber; " ";
Print #ReportFile, objInvoice.InvoiceDate; " ";
Print #ReportFile, objInvoice.Amount; " ";
If Date - objInvoice.InvoiceDate > objInvoice.Terms Then
Print #ReportFile, "Overdue"
Else
Print #ReportFile
End If
Next Index
End If
If we allow for just 5 invoices for the customer, creating this one simplified statement would take 38 network round trips. Try doing a print run for all your thousand customers and you would basically cripple your network (not to mention overdose you on caffeine).
The real killer here is network latency. On each segment of a LAN, there is a need to mediate access to the wire. Any routers in the path add further delays. Once you get the data transfer going, it basically goes at full speed. The bottleneck is in getting access to the wire. So the guiding principle here is this: do as much as you can in a single round trip. Of course, there is a point of diminishing returns. Once the size of the data stream you are transferring exceeds the maximum packet size on your network, it needs to be split up anyway. It can be a little difficult to know what size is "best", as this depends on the network protocols, architecture and topology. However, informal testing seems to indicate significant improvements in throughput as the packet size of data is increased to between 2K and 8K, followed by no significant further improvements but (and this is important) no perceptible degradation either.
So, what does this mean in practical terms?
The line you will often hear promoted is to create "heavy" methods, ones that take many parameters. In other words, rather than a series of assignments to properties followed by a call to a method, the method would take all the property values as parameters and would set the properties of the object as a side-effect to carrying out whatever work is demanded of it.
If you think this sounds ugly, it is. Quite aside from the aesthetics of this approach, the use of these parameter-laden methods is complex and error-prone. Perhaps even more importantly, those methods become very brittle. What happens when you want to add a new property to an object? You need to change the parameter lists of many methods, then find all the client code that calls those methods and change it too. An often overlooked issue is that the interface to the object becomes incompatible with the previous version, forcing the generation of new IIDs (GUIDs) for the interface and requiring every client to have the reference information reset and then re-compiled.
Compare that to the simplicity of the "light" method approach. Adding new properties does not break compatibility, and of course there is no need to change the method signatures either. Most client code is quite happy to ignore (and be oblivious to) the new properties so it generally does not need to be modified or even re-compiled.
But of course doing it that way is unworkable in a remote server scenario.
When in trouble or in doubt,
Run in circles, scream and shout.
OK, now that we have cleared our heads, let's take a step back and consider what we are trying to achieve here. We want to move an object around on a network efficiently while preserving the ease of use and robustness of the light-weight methods architecture. The problem is, there are two different machines that need to access an object: the client and the server. You can't pass an object reference in either direction without incurring the performance hit we have discussed above, so it seems that there is no way to do what we want.
It's time to get a little bit creative. What if we don't pass application object references across machine boundaries at all? What if we create a local ActiveX server and deploy it independently on both the server and each of the client machines? We add an additional property to each of the application objects called AsVariant. Reading the AsVariant property of an object returns an opaque variant that contains all the important property values. How it is implemented internally does not matter, because the only thing that the variant can be used for is to assign it back into the AsVariant property of an instance of the same object type. Doing so causes the object to set all the important properties as specified by the variant (which obviously means the same way it was when the variant was read).
We'll look at the implementation of this in just a moment. Before we do that, we'll look at why we want to do such a thing.
As well as the application object server(s), we also write a separate action server (say, a DBServer object) that is deployed as a remote server on the server machine. This application server exposes methods that take a single parameter of type -- you guessed it -- variant.
The easiest way to see how this works is to look at an example.
Dim C As CCustomer ' local server (on client) Dim DBServer As CDBServer ' remote server Set DBServer = New CDBServer Set C = New CCustomer C.Surname = "Smith" C.FirstName = "John" C.Address = "1 Smith St" C.City = "Sometown" C.State = "Vic" C.Telephone = "(03) 9876 5432" DBServer.SaveCustomer(C.AsVariant)
On the server side, we do something like this:
Public Sub SaveCustomer(ByVal varCustomer_IN As Variant) Dim C As CCustomer ' local server (on server) Set C = New CCustomer C.AsVariant = varCustomer_IN ' . . . now process the object as before
In an amazing feat of prestidigitation, we have managed to make the customer object local on both sides of the LAN wire. We don't pass the object itself; instead we pass the essence of the object in the form of a streamed set of parameter values. As you can see, using this technique is really quite simple.
If you need to modify the object in the server code and pass those modifications back to the client, just pass it back the same way. Either define the parameter as ByRef INOUT, or have the function return a variant:
Server:
Public Sub SaveCustomer(ByRef varCustomer_INOUT As Variant)
Client:
V = C.AsVariant DBServer.SaveCustomer(V) C.AsVariant = V
Alternatively:
Server:
Public Function SaveCustomer(ByVal varCustomer_IN As Variant) As Variant
Client:
C.AsVariant = DBServer.SaveCustomer(C.AsVariant)
You get the idea.
OK, how do we go about implementing AsVariant? The simplest way is just to define a variant property called AsVariant. In the Property Get procedure, you string together all the internal variables you use to maintain your object's state. You can do this any way you like, perhaps using some variation on the technique we are going to discuss below. In the Property Let procedure, you unpick all the values and reset the object's state as described by the value of the variant.
We actually went one step further and defined an interface and a helper server to make this process painless. By implementing a public interface, we can write routines that can handle different application object types polymorphically; for example a procedure could be written that takes any object that implements this interface and saves it to a disk file or e-mails it to another user or whatever.
Since we did not know what we might want to do with an object in future, we opted for the lowest common denominator as a format for storing the data: a string of characters. All string data is stored as strings, while all numerics and dates are stored in their character representations, so -3 is stored as "-3" and a date is stored as a string in system format. Because we want all the data to be in one string, we actually also store length information and parameter names in the string so that it can be unpicked easily and safely.
In case you were wondering, UniMess does not affect us here because we are not storing binary data in the string. Instead, we are storing character representations of binary data and that is quite safe.
The interface that we have defined is named IStreamableObject and defines the following methods:
Public Property Get CanHandleByteArray() As Boolean Public Property Get CanHandleVariant() As Boolean
We wanted to allow for objects to be able to stream arbitrary binary data such as bitmaps or audio files. However, implementing a binary stream can be significantly more complex than a string, and most objects will not need to use the former. For that reason, the interface defines these two read-only properties to allow an interested client to query an object about which formats it supports. Generally, an object would support either the AsVariant format alone or else both of them, as a byte array can be stored in a variant anyway.
Public Property Let AsByteArray(ByVal vData As Variant) Public Property Get AsByteArray() As Variant
AsByteArray assigns or returns the object state as a byte array. It is not valid to call AsByteArray if CanHandleByteArray is False.
Public Property Let AsVariant(ByVal vData As Variant) Public Property Get AsVariant() As Variant
AsVariant assigns or returns the object state as a variant. It is not valid to call AsByteArray if CanHandleVariant is False.
Public Property Get Key() As String
We foresee that a streamed object may be stored in a database. If necessary, a unique key that identifies the object will be returned by the object through the read-only Key property. If it is not appropriate, the object simply returns "".
Public Property Get ClassName() As String
To support polymorphic object creation from streams, the ClassName property returns, you guessed it, the name of the object class.
The source code for the (abstract) VB class that defines this interface is available.
To simplify the creation of object streams, we have created a server called PropertyList that allows you to throw any number of key-value pairs at it and then get them all back as a variant (in string format). You can also assign the variant back to it and then reference each of the values by their key, so the implementation of the AsVariant property looks like this:
Public Property Get IStreamable_AsVariant() As Variant
Dim PList As CPropertyList
Set PList = New CPropertyList
PList.Add "CustID", CustID
PList.Add "CustName", CustName
' etc etc
IStreamable_AsVariant = PList.AsVariant
End Property
Public Property Let IStreamable_AsVariant (ByVal varStream As Variant)
Dim PList As CPropertyList
Set PList = New CPropertyList
PList.AsVariant = varStream
CustID = PList.Items("CustID")
CustName = PList.Items("CustName")
' etc etc
End Property
You can access the values in any order, not just in the order that they were assigned. The source code to the server is also available.
The internal workings of the server are actually quite interesting. We use an internal class that has separate properties for the key and the value, and can encode them together into a string that contains the key, the variant type of the value and the value itself represented as a character string.
Public Property Get AsString() As String
Dim Result As String
If Len(Key) > 0 Then
Result = Format$(Len(Key)) & ":" & Key _
& Format$(VarType(Item)) & ";"
Select Case VarType(Item)
Case vbVNull, vbVEmpty
' do nothing
Case Else
Result = Result & CStr(Item)
End Select
Else
Result = ""
End If
AsString = Result
End Property
It can also decode this string back into a key string and a variant of the correct VarType and with the correct value.
Public Property Let AsString(ByVal ItemString As String)
Dim nKeyLen As Long
Dim DelimPos As Long
Dim nVarTypeLen As Long
Dim ItemType As VariantTypeConstants
Dim sItem As String
Dim EmptyV As Variant
DelimPos = InStr(ItemString, ":")
If DelimPos = 0 Then
Key = ""
Item = ""
Else
nKeyLen = Val(Left$(ItemString, DelimPos - 1))
Key = Mid$(ItemString, DelimPos + 1, nKeyLen)
sItem = Mid$(ItemString, DelimPos + nKeyLen + 1)
nVarTypeLen = InStr(sItem, ";")
If nVarTypeLen < 2 Then
Item = sItem 'default string format
Else
ItemType = Val(Left$(sItem, nVarTypeLen - 1))
sItem = Mid$(sItem, nVarTypeLen + 1)
Select Case ItemType
Case vbVNull
Item = Null
Case vbVEmpty
Item = EmptyV
Case vbVInteger
Item = CInt(sItem)
Case vbVLong
Item = CLng(sItem)
Case vbVSingle
Item = CSng(sItem)
Case vbVDouble
Item = CDbl(sItem)
Case vbVCurrency
Item = CCur(sItem)
Case vbVDate
Item = CDate(sItem)
Case Else
Item = sItem
End Select
End If
End If
End Property
The publicly exposed class CPropertyList uses a private Collection containing objects of this internal class and is really quite straightforward. Calling its Add method causes it to create one of the internal objects, assign the key and the value to it and then add it to the internal collection:
Public Sub Add(Key As String, Item As Variant) Dim Entry As CClassListEntry Set Entry = New CClassListEntry Entry.Key = Key Entry.Item = Item mcolItems.Add Entry, Key Set Entry = Nothing End Sub
When we reference the AsVariant property, it just works through the collection appending the string form of each of the entries into one long string in the format stringlength:stringdata, as follows:
Public Property Get AsVariant() As Variant
Dim Result As String
Dim Portion As String
Dim Entry As CClassListEntry
Result = ""
For Each Entry In mcolItems
Portion = Entry.AsString
Result = Result & Format$(Len(Portion)) & ":" & Portion
Next Entry
AsVariant = Result
End Property
Assigning to the AsVariant property simply decodes the individual portions and puts them back into the collection. In the following code, Clear is another method of the class that clears the internal collection.
Public Property Let AsVariant(Stream As Variant)
Dim Portion As String
Dim sData As String
Dim nPortionLen As Long
Dim nColonPos As Long
Dim nClipLen As Long
Dim Entry As CClassListEntry
sData = CStr(Stream)
Clear
Do While Len(sData) > 0
nColonPos = InStr(sData, ":")
nPortionLen = Val(Left$(sData, nColonPos - 1))
Portion = Mid$(sData, nColonPos + 1, nPortionLen)
Set Entry = New CClassListEntry
Entry.AsString = Portion
mcolItems.Add Entry, Entry.Key
Set Entry = Nothing
nClipLen = nColonPos + nPortionLen + 1
If Len(sData) > nClipLen Then
sData = Mid$(sData, nClipLen)
Else
sData = ""
End If
Loop
End Property
There is a little bit more to the implementation of this server but these are the interesting bits. I'll leave you to work through the source code for the gory details.
As always, I'd love to get feedback on your reaction to this technique. Personally, I am finding that many of the approaches that I would use in a truly object-oriented language (with inheritance) need to be re-thought in the world of VB5 and COM. I'd love to hear about the challenges you have faced (or perhaps are still battling with).