Geocoding Sampleen-US.Text Version 0.95.2004.102John SampleZips, 19 Apr 2008 09:35:00 GMT<P>The toughest set of data to obtain for geocoding is a zip code to city translation.</P> <P>Unfortunately, the USPS treats zip data as a trade secret and demands licensing fees for distribution which makes creating a free geocoder a little harder.</P> <P>You used to be able to obtain a decent city to zip mapping from the FIPS55 data set. However, the FIPS folks were force to remove this information from all future releases. I still have the old FIPS data which I can use if worse comes to worse, but I'm trying to find a way around it by generating the data myself if at all possible. This way the data doesn't get continually out of date.</P> <P>While going through the new census data format I thought I stumbled upon a way to extract city data.<BR><BR>First a little background on why this data is important, then I'll show you so pictures of why its so difficult.</P> <P>For each street in the database we have the associated zip code. Actually we have two zip codes, one for each side of the street, but they are generally both the same. The zip code of the street is the main way in which we narrow down the search space for an address.<BR>For example, if you tried to geocode &#8220;123 Main St Anytown, NY&#8221; the first step is to figure out what the possible zip codes are for Anytown, NY then see if we have any street names in that zip range. Note that if you try to geocode with just the zip code this is a non issue. &#8220;123 Main St 12345&#8221; could be geocoded without the city lookup at all. However, when we display information about the address it would be nice to know where that place is by using the zip to display the city.</P> <P>The census data does contain names and geometry for most &#8220;places&#8221; (city, towns, etc.) so I investigated extracting the shape of all the cities in the country.<BR>It also contains the shapes of what the census calls ZCTAs or &#8220;Zip Code Tabulation Areas.&#8221; I was hoping to overlay these two sets of data, then go through each place to extract every zip code it touches.<BR>Unfortunately the &#8220;place&#8221; geometry doesn't give very good coverage as it uses very strict definitions for the boundaries of cities.<BR><BR>Here is a projection of the roads and a few landmarks in densely populated Fairfax County:</P> <P><IMG src="/images/ffx_a.jpg"><BR>All of these roads are in a city of some sort as you or I would know them, but when you overlay the census city shape data (green) it looks like this: <IMG src="/images/ffx_a_place.jpg"><BR></P> <P>As you can see the place data doesn't even come close to covering all the places where people live.</P> <P>There still may be a way to get the data out of here, but its going to be tougher than I had hoped.<BR>In the meantime the database creation can continue, its just going to have a few place holders where the zip translation can be plugged in later.</P><img src ="" width = "1" height = "1" />John SampleGeocoding Redux, 16 Apr 2008 08:44:00 GMT<P>After two kids and a long break I'm finally working on the new geocoder.</P> <P>All the previous realeases have been rendered moot due to change in the way the census releases the data.</P> <P>Frankly, this is one of the reasons I took such a long break from it since they announced several years ago this was going to happen.</P> <P>As we speak I've completed about 75% of the work required to load the new data. <BR><BR>The good news: this one should be totally cross platform. The database creation is being done by a python and I'm testing it on both linux and windows.<BR>Actual geocoding will be done by separate APIs for each language. Right now I've got C#, Java, and Python slated but it should be easy enough to reate more.</P> <P>The bad news: I'd say any sort of stable release is a few weeks away. I really only get to work on it after the kids go to sleep.</P> <P>The licensing&nbsp;will hopefully be way clearer this time. I'm kicking around GPLv3 with some sort of dual license for those who need more flexibility.<BR>Stay tuned for more info!</P> <P>&nbsp;</P><img src ="" width = "1" height = "1" />John SampleUp, 15 May 2006 09:54:00 GMT<P>Its back up. <A href=""></A></P> <P>Here's what happened:</P> <P>We were having quite a few people over Saturday for a party and when I went to make sure my <A href="">MAME machine</A> was still working I found out the motherboard was dead. I had a spare machine,&nbsp;but it was lacking a power supply since the week before I had given it to my brother. </P> <P>Unfortunately it was 9 pm when I found out and all the computer stores were closed, so in a moment of mad scientist style hacking I wired the power supply of my main computer to the spare one without actually taking it out of the case. Picture jump starting a car, the cases were basically on top of each other with wires and drives snaked between them. </P> <P>I was able to get the arcade computer to the point where all I would have to do is put in power the next day, however, somewhere in the process the chip fan power cable got knocked off of my main computer and the chip melted in the early morning. <BR>That left me with 3 dead machines. Saturday morning before everyone came over I replaced the mobo+chip in my main computer and got power to the MAME machine, so all is well with the world at the moment. </P><img src ="" width = "1" height = "1" />John SampleToast, 13 May 2006 06:08:00 GMT<P>Through a freak accident I managed to fry 3 computers last night. </P> <P>The demo db won't function and the downloads won't be available until I restore the boxes.</P> <P>&nbsp;</P><img src ="" width = "1" height = "1" />John SampleState of the Zips, 02 May 2006 10:16:00 GMT<P>I've started working on the geocoding project again. There was some down time while we adjusted to having a newborn around and time was short.</P> <P>FIPS has made some changes that will make distributing an installer virtually impossible. Mainly, the USPS had them remove zip code information from their publications. I've had to come up with another source of info at great expense, so I will probably only be able to distribute the db in pre built form unless another solution arises. This also means I'll need to start charging for it to make up some of the cost.</P> <P>I know there were some people who had trouble downloading the zipped up db in the past. The problem is fixed now but the location has changed. If I sent you the download info and you were never able to retrieve it shoot me a line.</P> <P>One of the coolest new features I'm working on at the moment is suggestions for misspelled streets and cities.</P><img src ="" width = "1" height = "1" />John SampleTemporary Source, 15 Feb 2006 20:26:00 GMT<P>Getting subversion up and running for source control is proving more difficult than I thought when it comes to access control.<BR>Everything is in VSS which would be perfect but running over the web isn't really possible.</P> <P>In the meantime here is half of project. Two things are missing:</P> <P>1. The database installer<BR>2. The CrLab MySQL dll <BR>3. AddressParse DLL source (Created in a separate program, I'll post details later.)<BR>I'm waiting for a response from CrLab on whether I can include their dll. If I can't its an easy swtich back to the MySQL provided connector.<BR>The installer is in no shape to be released at the moment so in the interim I'm going to give a download link through email. The compressed MySQL db is 2.5 gigs so you can understand why I can't post a link here. If you want a copy before the installer is up email me and I will give you a link if my bandwidth at home can take it. <BR><BR>The code is a maze of spaghetti in massive flux while moving to the spatial index, have some patience while reading it until everything is flushed out.<BR><BR>And without further ado:<BR><STRIKE></STRIKE></P> <P><BR><BR><BR>&nbsp;</P><img src ="" width = "1" height = "1" />John SampleOn the source, 10 Feb 2006 14:43:00 GMT<P>Ok, I've been talking about releasing the source for a while now so let me explain the delay and recruit suggestions on how to proceed.</P> <P>First of all, thanks for the all the email. I stopped responding for a bit there and I hope to do better in the future.</P> <P>Second, I'm torn on how to release the source. If you have seen the background of the project so far you'll discover this started as a pet project after I couldn't find any viable source for reverse geocoding information. The installer was a nice touch, but keeping up with both the evolving census data and actual improvements to the program has gotten out of hand. Right now the installer has been (obviously)&nbsp;rendered useless until I can revamp it.</P> <P>Part of the lack of communication was due to an episode with a user who needed help. It was a long, drawn out, frustrating email exchange which eventually netted a working system. I later found out the user was the employee of a LARGE global consulting firm and they were putting it on a client's system. (Hehe, whoever the client is got ripped off.)<BR>I love GPL zealots, but frankly I'm not one of them as my mortgage gets paid by designing software. </P> <P>Before I release the source I'd like to find a way to license it so that it can be used by small developers/low commercial/nonprofit but also balance the need for compensation for commercial use. I'm not looking for the &#8220;give it away and make money in support&#8221; model here because I really don't have the time and it would probably violate my employment terms.&nbsp;<BR><BR>What I'm looking for:<BR>1. Allow non commercial use.<BR>2. No reselling, repackaging, commercial use without permission, although this doesn't necessarily mean any purchase is involved.<BR><BR>Ideally I'd like to form some sort of co-op where contributors could benefit from commercial use.<BR><BR>Anyway, any ideas on how to license this thing to protect the time investment it requires?</P> <P>&nbsp;</P><img src ="" width = "1" height = "1" />John SampleDemo Update, 16 Nov 2005 20:21:00 GMT<P>I added forward geocoding to the demo. Its weakest point at the moment is resolving&nbsp;cities and address to zip codes and vice versa. It will get better with time. <BR>I'm close to beta at this point.</P> <P>Since there is no installer I'm going to let a few people download the database to try out.<BR>I'll make an announcement soon, I'd like to collect feedback on the demo first.</P> <P><A href=""></A></P><img src ="" width = "1" height = "1" />John SampleFixed, 15 Nov 2005 11:50:00 GMT<P>The new database has been loaded.</P> <P>The results should be correct now:</P> <P><A href=""></A></P> <P>&nbsp;</P><img src ="" width = "1" height = "1" />John SampleBug, 13 Nov 2005 09:40:00 GMT<P>I foung a giant bug in the way names were being loaded.</P> <P>I'm rebuilding the database, hopefully this will eliminate the repetetive street name issue.</P> <P>&nbsp;</P><img src ="" width = "1" height = "1" />