Some years ago, I needed to have a flexible search facility on an address field or fields. The address has stored as STREET (multivalued), SUBURB, TOWN/CITY and POSTCODE.
I wanted to be able to search on a word or a phrase consisting of more than one word. This is a problem as most indexes are designed to only index one word or a list of words; not a sequence of words anywhere within a phrase. But, the b-tree index by its nature can search on a phrase, as long as the phrase starts with the search text.
I was using the original System Builder at the time (before SB+) and this limited the display list within the SB selection boxes. So I created a subroutine to build a multivalued field of the complete address, each value being the address rotating each word from the front to the back.
This enabled the user to see the address and as the search is a b-tree, the user can enter in the start of a word or a phrase as needed.
For example, if the address is 12 Smith Street, Batsfield, London, 1234, then the multivalued field would be:
<pre>12 SMITH STREET BATSFIELD LONDON 1234 SMITH STREET BATSFIELD LONDON 1234 12 STREET BATSFIELD LONDON 1234 12 SMITH BATSFIELD LONDON 1234 12 SMITH STREET LONDON 1234 12 SMITH STREET BATSFIELD 1234 12 SMITH STREET BATSFIELD LONDON </pre>
This enables the user to search on 12 SMITH STREET, SMITH STREET, SMITH ST or even just SMITH. Also, the suburb of BATSFIELD or even the city and postcode could be search e.g. LONDON 1234.
Strangely enough, the users found this easy to use and read as they could start with an street number and name and if not found, then enter the street name and street type or even the suburb.
The subroutine consisted of:
<pre>SUBROUTINE ADDR.IDX(IDXLINE,ADDR) ******************************************** * SUBROUTINE TO MAKE AN ADDRESS INDEX * ALL THE ADDRESS LINES * Street 1 - 1 * Street 2 - 2 * Suburb - 3 * Town/City - 4 * PostCode - 5 * ARE CONCATENATED TOGETHER AND TRIMMED. * THEN THE WHOLE LINE IS INDEXED. * THE LINE IS THEN CHANGED BY SHIFTING THE FIRST WORD TO THE END OF THE LINE * AND THEN THIS IS INDEXED. LINE = ADDR<1,1> : " " : ADDR<1,2> : " " : ADDR<1,3> LINE := " " : ADDR<1,4> : " " : ADDR<1,5> LINE = TRIM(LINE) NOWORDS = DCOUNT(LINE," ") LENLINE = LEN(LINE) IDXLINE = "" FOR I= 1 TO NOWORDS IDXLINE<1,-1> = LINE WORD = FIELD(LINE," ",1) LINE = TRIM(LINE[COL2()+1,LENLINE] : " " : WORD) NEXT I RETURN </pre>
I have also added the phone number and fax numbers in some situations; suffixed with “:PH” and “:FAX” respectively.
Then I would create a derived field which calls this subroutine with a SUBR() and then index using the inbuilt B-TREE in UniVerse.
The downsides? Plenty. The first is the size of the index file. It does get rather large as you could imagine as each address is added multiple times, the same number as the words within the address. But if you wanted to index each word within a phrase, the index would still be large.
And also the speed. Each time the address is changed or updated, the indexed derived field must be regenerated. Is the speed that bad that it can be noticed? When users are manually updating the address field, there is no noticeable speed decrease, but if an electronic/automated load of data is done, there is a slight decrease in speed compared to no index updating.
Additionally, address cleaning before indexing would help with the overall usefulness of this index.