Doing the Split
by Ross Mack - GUI Computing
'Listen to the outworlders, Caleb. They may seem strange and their tongue is foreign but there are things they know which we do not. Do their craft not run before the wind like stallions? There are things they can teach us just as there are many things we can teach them.'
- Aluröch the Seer
One of the things about working with languages other than the one you usually work with (your native tongue ?) is that you discover things that your normal language does not do, or does not do so well. You discover features that you wish you still had when you get back to the language you usually use. I recently experienced this when I spent quite a while writing code in Perl (yes, Perl) on Unix boxes instead of my usual VB. Perl is a very cool language and has quite a few features that VB does not. However, some of them would be quite inappropriate. But some would be very handy.
For example, Perl has some very strong Regular Expression handling. You can easily parse whole files using regular expressions very easily. However, that's not what I'm here to talk about on this occasion. Perhaps when I have written a regular expression library for VB I'll come back to that.
A couple of the functions I found I could no longer live without when I got back to VB were split() and join(). The purpose of split() is essentially to take a string and break it up into an array of strings based on a specified delimiter. Let's take the example of the following string:
"One:Two:Three:Four:Five"
Using the split function you could pass in this string and specify ":" as the delimiter and split() would return an array of five items as follows:
"One"
"Two"
"Three"
"Four"
"Five"
I think that's cool.
So, what does the join() function do? Just the opposite of split(), join() takes an array of items and joins them using the specified delimiter into a single string. For example you could take the array of five items returned by split() and pass them into join() specifying "#" as the delimiter and that would result in the following string:
"One#Two#Three#Four#Five"
I think that's cool too.
Note to Perl enthusiasts (and I think you know who you are) : Yes, I know I'm not using the correct Perl nomenclature. That's intentional. Get over it.
Unfortunately, Perl has a different variable reference structure to VB so the functions I have built to emulate these functions in VB are not quite the same. Also, they don't support some of the weird and wonderful things you can do in Perl using regular expressions and the such, but they cover most cases you will run into with VB. The code is vanilla VB and I have used it in VB3, VB4 and VB5 with only minor modifications if any. As you can see it's not overly complicated, but I have discovered that since I wrote it and keep a copy with me I find places to use it every where.
Function Join(sArray() As String, ByVal sDelim_IN As String) As String
Dim sTemp As String
Dim iStart As Integer
Dim iEnd As Integer
Dim iNum As Integer
' Initialise the variables we need
sTemp = ""
iStart = LBound(sArray)
iEnd = UBound(sArray)
' Loop through the array of items passed
For iNum = iStart To iEnd
sTemp = sTemp & sArray(iNum)
' Unless it's the last item add the delimiter
If iNum < iEnd Then
sTemp = sTemp & sDelim_IN
End If
Next iNum
' return the concatenated string
Join = sTemp
End Function
Function Split(ByVal sSource_IN As String, ByVal sDelim_IN As String, sArray() As String) As Integer
Dim nDelimLen As Integer
Dim nFieldCount As Integer
Dim sCurrent As String
Dim nPos As Integer
' Split's the passed string into a number of elements
' separated by the indicated delimiter. These fields
' are then written into the array passed, which is 1
' based. The number of fields found is returned.
Erase sArray
nDelimLen = Len(sDelim_IN)
sCurrent = sSource_IN
Do
' look for the specified delimiter
nPos = InStr(1, sCurrent, sDelim_IN)
If nPos > 0 Then
nFieldCount = nFieldCount + 1
ReDim Preserve sArray(1 To nFieldCount)
' grab the string left of the delimiter and add
' to the array
sArray(nFieldCount) = Left$(sCurrent, nPos - 1)
' cut the used portion off the string
sCurrent = Mid$(sCurrent, nPos + nDelimLen)
End If
Loop Until nPos = 0
If sCurrent <> "" Then
' Deal with the last item without a trailing delimiter
nFieldCount = nFieldCount + 1
ReDim Preserve sArray(1 To nFieldCount) As String
sArray(nFieldCount) = sCurrent
End If
' return the number of items in the array
Split = nFieldCount
End Function
You will notice that Split uses Redim Preserve fairly abundantly. I usually try to avoid this, as it is a slow operation. However, the only way to avoid that would be to do virtually all the rest of the processing in Split twice so as to determine before dimensioning the array how many elements it should have. I have included a version of split() in the downloadable zip file that functions this way. I found it to be faster with large numbers of elements, but the element size was also a factor. Either way VB5 on a Pentium 166 can split a string into an 80 element array 25000 times in just over a minute. That seems fast enough.
Note: Split always re-dimensions the string array you pass it and kills any existing values in it. When it gets passed back it will always be a 1 based array with a number of elements specified by the integer returned by Split.
So where can you use split and join? There are multitudes of places.
Here is a simple one. Let's say you want to quickly get the filename portion from a full path and filename:
Function GetFileName(ByVal sIN as String) As String Dim nCount As Integer Dim sPath() As String ' Split the full path based on slashes. nCount = Split(sIN, "\", sPath()) ' return whatever cam after the last slash. GetFileName = sPath(nCount) End Function
Or, for you C programmers out there, a more terse version (you can thank David Thompson for this one):
Function GetFileName(ByVal sIN as String) As String Dim sPath() As String GetFileName = sPath(Split(sIN, "\", sPath())) End Function
Another example; if you are importing a delimited text file you can use split to simply break up each line of the file into an array of values. The code may look like this:
Dim sLine as string
Dim nFieldCount as Integer
Dim sValues() as String
Dim hFile as integer
hFile = Freefile
Open sFilename for input access read as hFile
Do While not eof(hFile)
' Read a line from the file
Line Input #hFile, sLine
' Split the line on tabs
nFieldCount = Split(sLine,Chr$(9),sValues)
' Check we have the right number of values
if nFieldCount = gnDesiredFieldCount then
' We have the right number
sSQL = "INSERT INTO mytable VALUES ("
sSQL = "'" & sValues(1) & "',>'" & sValues(2) & "')"
else
' Wrong number of fields
sSQL = "INSERT INTO myerrorstable VALUES ("
sSQL = sSQL & "'" & sValues(1) & "')"
end if
db.Execute sSQL ' or whatever
Loop
Close #hFile
As you can see, the import process becomes quite simple. In addition, you will notice that in the example I use the Chr$ function to pass a tab character into split as the delimiter. Split and join work fine with any sort of delimiter. You can freely use all manner of non-printing characters including Chr$(0). You can also use delimiters of any length as the delimiter length is accounted for in both functions. So, for example you could use "{I am a delimiter}" as your delimiter, although it incurs something of a size overhead.
Here is a classic situation, to read all the Setting names from a Section of an INI file you would use the following code:
Declare Function GetPrivateProfileString Lib "Kernel" (ByVal lpApplicationName As String, lpKeyName As Any, ByVal lpDefault As String, ByVal lpReturnedString As String, ByVal nSize As Integer, ByVal lpFileName As String) As Integer
Dim sBuff As String * 500
Dim nResult As Integer
Dim sSettings() As String
nResult = GetPrivateProfileString("MySection", 0&, "", sBuff, 500, "whatever.ini")
If nResult > 0 then ' We read something
sBuff = Left$(sBuff, nResult)
' Split the setting names out into an array of setting names.
nResult = Split(sBuff, Chr$(0), sSettings())
End If
If the section is found in the INI file the names of all the settings in that section are returned into the provided string buffer delimited by ASCII character 0. I used to avoid this call, as it was always a pain to then extract all the setting names from the string, but not when you can use split. Just like a late night infomercial isn't it?
Of course, the amount of data these functions can deal with is limited by the effective size of VB strings. This means they have an absolute upper limit of just less than 64k, but you will run out of 'String Space' somewhere before that, depending on which version of VB you are using.
You can also store data in a hierarchical form by using different delimiters at different levels of the hierarchy and split them iteratively. However, you are probably better off using some other structure to do that unless you really have to represent all that data in a single string (See Jim Karabatsos' article on the iStreamable interface for a potential circumstance).
So, there it is, a useful snippet of code and a few ideas on where it can be used. But remember to use it wisely. The downloadable ZIP file includes a BAS module with Split for VB3 and the equivalent for VB5 (VB4 users should use one or the other as appropriate). It also includes a simple demonstration app in VB3 and in VB5.