by Ross Mack - GUI Computing
Sometimes the most important features of software are the ones you don't think much about. I often find that the features clients get most excited about are things like the ability to import data from other applications. However, data import can be a real hassle. There are all sorts of formats to deal with and you need some reliable way to deal with them all.
Recently I had to add functionality to an application to allow import of Comma Separated Values (CSV) files as produced by Excel. I couldn't find a tool that really handled that nicely so I was forced to write the code to handle that format myself. Let's have a look at how the CSV format works.
As the name would suggest, CSV files have each value delimited by commas. A simple example of a line from a CSV file would be as follows: 1,Fred,Flinstone,Bedrock,555-1234
As you can see I am building a database of cartoon characters. Anyway, that line has 5 fields, each separated by commas. Simple.
The question arises, what happens if there is a comma in one of the fields? Here is an example:
2,Barney,Rubble,"Granite Street, Bedrock",555-2345
As you can see we also have Barney's street name. The use of the quotes indicates that whatever is inside constitutes one single field. So, like Fred's entry we end up with a total of 5 fields. The fourth field should have the quotes stripped when parsed, as they are not part of the field but merely additional delimiters to escape the embedded comma.
There's one more thing you need to know. When a field has embedded double quote characters, CSV delimits the field with double quotes the same way it does when the field contains commas. Then it uses two double quote characters to represent a single double quote character in the field. Confused yet? Let's look at another example:
47,"Fred ""Twinkle Toes""",Flinstone,Bedrock,555-1234
As you can see here, Fred's nickname of "Twinkle Toes" has been included with his name. When parsed the double quotes at either end are removed and the two sets of two double quotes in the middle are replaced by a single double quote. Well, now I'm confused. Anyway, the end result is the second item after parsing looks like this:
Fred "Twinkle Toes"
Now, on to the point of this article. I basically constructed some code, which takes a single line of CSV and parses it into a delimited line of text. I took this approach because I already have routines that will handle delimited text very easily, so once I have it in that format it's easy to handle. It's also easier to convert from CSV to some other delimiter type than to do anything else to it. For example converting from CSV to Tab delimited. See my article of a couple issues ago on Split and Join.
The code I wrote to do that looks like this:
Function CSVToDelim (ByVal sStart As String, ByVal sDelim As String) As String Dim sResult As String Dim sClause As String Dim nNum As Integer Dim bInQuotes As Integer ' boolean Dim sChar As String Dim sTemp As String ' Replace all occurances of double-double quotes with a ' flag value so we can find them later. They represent ' double quote characters within a clause, they are ' important. ' This one is a special case, double quotes right at ' the start of the field, immediately after the opening ' double quote. sTemp = StrReplaceAllWith(sStart, ",""""""", ",""+CHR34+") ' This is for all other occurances of double quotes. sTemp = StrReplaceAllWith(sTemp, """""", "+CHR34+") ' We deal with the string one character at a time. For nNum = 1 To Len(sTemp) ' grab the next character in line to deal with. sChar = Mid$(sTemp, nNum, 1) ' If it's a double quote it requires special handling. If sChar = """" Then bInQuotes = Not bInQuotes ' Once we've changed context we no longer want that character sChar = "" End If Select Case sChar Case "," ' decide whether the comma has been escaped ' with quotes or if it is a field delimiter. If bInQuotes Then sResult = sResult & sChar Else sResult = sResult & sDelim End If Case Else ' Anything else we just add to the end. sResult = sResult & sChar End Select Next nNum ' Replace our special flag with the double quote character sResult = StrReplaceAllWith(sResult, "+CHR34+", """") CSVToDelim = sResult End Function
Essentially it loops through the string figuring out where each field starts and ends and replaces the comma delimiter with the delimiter character passed in. I normally use a nonprinting character like Tab or ASCII 0 for this. As it goes through it also removes the quotes that surround fields. I also use some bulk find and replace to replace the double double quotes with a flag so that I ignore them for the rest of the parsing and can bulk replace them with what they need to be once the parsing has finished.
The find and replace code has appeared in a number of my previous articles so I will not describe it here. Once you have the text parsed and delimited in a more readily useable fashion you can do whatever with it.
For an example of how you might use this code here is a small snippet that converts a CSV file into tab delimited.
Dim hInFile As Integer Dim hOutFile As Integer Dim sLine As Integer hInFile = FreeFile Open dlgCommon.FIlename For Input Access Read As hInFile hOutFile = FreeFile Open GetAppPath() & "parse,tmp" For Output Access Write As hOutFile Do While Not EOF(hInFile) Line Input #hInFile, sLine Print #hOutFile, CSVDelim(sLine, Chr$(9)) Loop Close #hOutFile Close #hInFile
So, add this one to your bag of tricks and next time you need to handle importing files or similar operations you will have something in your kit bag for dealing with CSV.