I am attempting to decode a web log file, into a UniVerse record. The weblog record fields are space delimited with text fields within double quotes. This is similar to a standard CSV (comma separated value) file format.

I personally hate massive string extraction work, and this started to look like spaghetti code.

So, while walking around the local park, for a breather, the idea hit me. Why not break the web log string line into groups, based on where the text fields are. Then it hit me again, this would work on CSV files also.

OK, the problem is that field delimiters (comma’s in CSV) can be within a text field (delimited by double quotes). So, if you break up or group the record by the text marks (double quotes) and then process each group separately, it should be easier to handle.

As the text marks must be in pairs, the group before a text mark will be normal field(s) and can have the field delimiters converted to attribute marks, the group between text marks must be processed as is, ignoring the field delimiters (comma’s) and the group after a text mark also must be processed as normal individual fields.

As a more general statement, if there are a mixture of normal fields and text fields, then all odd numbered groups consist of normal field(s) delimited and all even numbered groups are text fields (delimiters ignored).

So, the end result, along with error checking and all the gunk that turns a couple of lines of code into a usable routine, is:


SUBROUTINE DECODE.CSV(RESULT, CSV.TXT, FLD.DELIM, STR.DELIM)
*-------------------------------------------
*
* Decode Character Seperated Value line of text to a record
*
* Input : CSV.TXT - A single line of CSV text
*         FLD.DELIM - Field delimiter, normally a comma
*         STR.DELIM - String delimiter, normally a double quote
*
* Output: RESULT - A record with fields delimited by Field Marks (@FM)
*                        - EMPTY.STR if an error
*            @USER.RETURN.CODE - 0 - No error
*                                          - 1 - An error
*
$COPYRIGHT "David Murray 2007"
*
*-------------------------------------------
*
* Rev Date        User         Notes
* 0   04 Oct 2007 David Murray Created
*
* Setup equates or constants
EQUATE EMPTY.STR TO "" ;* Empty string
EQUATE COMMA TO "," ;* Comma for field delimiters
EQUATE STR.QUOTE TO '"' ;* String quote
*
* MAIN
GOSUB INITIALISE
GOSUB DECODE
GOSUB FINALISE

IF OK NE @TRUE THEN
*
RESULT = EMPTY.STR
@USER.RETURN.CODE = 1
END ELSE
*
@USER.RETURN.CODE = 0
END
*
RETURN
*
* GOSUBs
*
* Initialise variables, open files etc.
INITIALISE:
*---------
*
OK = @TRUE ;* General error flag

RESULT = EMPTY.STR ;* The resultant record goes here

* Quick error check and exit
IF CSV.TXT EQ EMPTY.STR THEN

OK = @FALSE
RETURN
END

* Check for defaults
IF FLD.DELIM EQ EMPTY.STR THEN FLD.DELIM = COMMA
IF STR.DELIM EQ EMPTY.STR THEN STR.DELIM = STR.QUOTE

* Can not have same character for field and string delimiters.
IF FLD.DELIM EQ STR.DELIM THEN

OK = @FALSE
RETURN
END

* Must have an even number of string delimiters else there is a syntax error
IF MOD(COUNT(CSV.TXT, STR.DELIM), 2) NE 0 THEN

OK = @FALSE
RETURN
END

RETURN

* Decode CSV to fields using @AM
DECODE:
*-----
*
IF OK NE @TRUE THEN RETURN ;* General error check

* Count the number of text strings and change the string quotes to attribute marks
* to group data
GROUP.MAX  = DCOUNT(CSV.TXT, STR.DELIM)
CSV.TXT = CHANGE(CSV.TXT, STR.DELIM, @AM)

* Initialise result var. Each field will be appended to the result.
RESULT = EMPTY.STR

* Cycle through each group, checking group position of odd and even.
FOR GROUP.PTR = 1 TO GROUP.MAX

IF MOD(GROUP.PTR,2) NE 0 THEN

* If the group position is odd then it is not a text field, so convert field delimiter(s) to a field mark(s)
RESULT := CHANGE(CSV.TXT, FLD.DELIM, @FM)
END ELSE

* The group position is even, so is a text field, so DO NOT convert field delimiter(s) to field marks,
* just copy whole text field as is.
RESULT := CSV.TXT
END
NEXT GROUP.PTR

RETURN
*
FINALISE:
*-------
*
RETURN

So, using a bit of paper, write down a line of CSV, divide up the line by the text marks (double quotes) and have a go…

Oh, I assume that this method is already documented somewhere. If you find or know of similar methods, add a link or reference as a comment. Thanks.

save.run.done.