John Sample

Bits and Bytes
posts - 103, comments - 354, trackbacks - 16

So Close.....

Still aiming for releasing the new versions tonight... including MySQL..

As I type this I'm waiting for (hopefully) the final test run to complete. The most frustrating thing about putting it all together is that it takes several hours for the installer to run. This means every time I change something I have to wait 2 hours to see if it worked.

The new version is using a completely different architecture. First of all the interface will now be a .NET dll.
The SQL functions in the current version have been ported forward to the new version, so you can continue to use them, but no additional functionality will be added to them, its just too impractical to write some of the things I'm trying to do in stored procedures.

The database structure has also changed. The primary reason is because we were wasting too many cycles and disk IO on testing for exceptional cases.
The TIGER/Line data has left and right columns for each side of the street. For example the zip code is stored in zipL and zipR for each street segment. The same goes for county and state. This means that each query had to have an OR statement in it which caused a slower unindexable search.
This was a waste because 99% of the records have identical values for each of the columns.
Now we only store one zip, county, and state column which index well and provide a much quicker response.
A pre-parser in the installer creates a record for each side if they lie in two different zips or counties so we don't lose any data.

The pre-parser allows the storage of even more addresses in less space. Addresses that were ignored in the last installer because they contained dashes or letters can now be cleaned. A full Type 1 load now contains 17,000,000 street segments. The MySQL database is 1.72 gigs, SQL Server is just under 3 gigs.

The same DLL will be used for all db versions. Each call contains 2 more parameters than the older SQL functions: a connection string and database type.
The return types will no longer be recordsets unless I implement some kind of custom datareader, instead it will be some kind of class structure. I expect that to be in flux for a while. Recordsets just aren't suited for the data that will be added. For example, I want to return the geocode for the address, the address structure that was used if it was parsed, or if its not found some suggestions for other addresses.
For this reason I may release the installer a bit ahead of the DLL (a day earlier at max I hope). This will give people time to load the database while I put some finishing touches on things.


posted on Monday, September 05, 2005 1:52 PM


# re: So Close.....

great article
7/11/2011 10:22 PM | air Jordan shoes

Post Comment

Enter the code you see: