I am attempting to decode a web log file, into a UniVerse record. The weblog record fields are space delimited with text fields within double quotes. This is similar to a standard CSV (comma separated value) file format.

I personally hate massive string extraction work, and this started to look like spaghetti code.

So, while walking around the local park, for a breather, the idea hit me. Why not break the web log string line into groups, based on where the text fields are. Then it hit me again, this would work on CSV files also.

OK, the problem is that field delimiters (comma’s in CSV) can be within a text field (delimited by double quotes). So, if you break up or group the record by the text marks (double quotes) and then process each group separately, it should be easier to handle.

As the text marks must be in pairs, the group before a text mark will be normal field(s) and can have the field delimiters converted to attribute marks, the group between text marks must be processed as is, ignoring the field delimiters (comma’s) and the group after a text mark also must be processed as normal individual fields.

As a more general statement, if there are a mixture of normal fields and text fields, then all odd numbered groups consist of normal field(s) delimited and all even numbered groups are text fields (delimiters ignored).

So, the end result, along with error checking and all the gunk that turns a couple of lines of code into a usable routine, is:

 SUBROUTINE DECODE.CSV(RESULT, CSV.TXT, FLD.DELIM, STR.DELIM) *------------------------------------------- * * Decode Character Seperated Value line of text to a record * * Input : CSV.TXT - A single line of CSV text *         FLD.DELIM - Field delimiter, normally a comma *         STR.DELIM - String delimiter, normally a double quote * * Output: RESULT - A record with fields delimited by Field Marks (@FM) *                        - EMPTY.STR if an error *            @USER.RETURN.CODE - 0 - No error *                                          - 1 - An error * $COPYRIGHT "David Murray 2007" * *------------------------------------------- * * Rev Date        User         Notes * 0   04 Oct 2007 David Murray Created * * Setup equates or constants EQUATE EMPTY.STR TO "" ;* Empty string EQUATE COMMA TO "," ;* Comma for field delimiters EQUATE STR.QUOTE TO '"' ;* String quote * * MAIN GOSUB INITIALISE GOSUB DECODE GOSUB FINALISE IF OK NE @TRUE THEN * RESULT = EMPTY.STR @USER.RETURN.CODE = 1 END ELSE * @USER.RETURN.CODE = 0 END * RETURN * * GOSUBs * * Initialise variables, open files etc. INITIALISE: *--------- * OK = @TRUE ;* General error flag RESULT = EMPTY.STR ;* The resultant record goes here * Quick error check and exit IF CSV.TXT EQ EMPTY.STR THEN OK = @FALSE RETURN END * Check for defaults IF FLD.DELIM EQ EMPTY.STR THEN FLD.DELIM = COMMA IF STR.DELIM EQ EMPTY.STR THEN STR.DELIM = STR.QUOTE * Can not have same character for field and string delimiters. IF FLD.DELIM EQ STR.DELIM THEN OK = @FALSE RETURN END * Must have an even number of string delimiters else there is a syntax error IF MOD(COUNT(CSV.TXT, STR.DELIM), 2) NE 0 THEN OK = @FALSE RETURN END RETURN * Decode CSV to fields using @AM DECODE: *----- * IF OK NE @TRUE THEN RETURN ;* General error check * Count the number of text strings and change the string quotes to attribute marks * to group data GROUP.MAX  = DCOUNT(CSV.TXT, STR.DELIM) CSV.TXT = CHANGE(CSV.TXT, STR.DELIM, @AM) * Initialise result var. Each field will be appended to the result. RESULT = EMPTY.STR * Cycle through each group, checking group position of odd and even. FOR GROUP.PTR = 1 TO GROUP.MAX IF MOD(GROUP.PTR,2) NE 0 THEN * If the group position is odd then it is not a text field, so convert field delimiter(s) to a field mark(s) RESULT := CHANGE(CSV.TXT, FLD.DELIM, @FM) END ELSE * The group position is even, so is a text field, so DO NOT convert field delimiter(s) to field marks, * just copy whole text field as is. RESULT := CSV.TXT END NEXT GROUP.PTR RETURN * FINALISE: *------- * RETURN 

So, using a bit of paper, write down a line of CSV, divide up the line by the text marks (double quotes) and have a go…

Oh, I assume that this method is already documented somewhere. If you find or know of similar methods, add a link or reference as a comment. Thanks.