The crime map is getting closer.
I went ahead and got a url for it (http://www.fairfaxinfo.com).
I've got most of the data from last few years, plus the sex offender list. Of the 18,000 crimes I have in the database, only the ones described as “major“ are being displayed. As soon as I figure out a hierarchy for the minor incidents, I'll display those also. I plan on making the database searchable at some point.
The toughest part of getting this thing together is getting the data. Fairfax does a good job of providing information, but the formats are less than desirable.
For example, the process of getting the crimes loaded to the site goes like this:
- A process scrapes this page for word documents and downloads them.
- The process then converts each of these word files to txt files.
- A series of regular expressions run on these files trying to extract the location, type, district, and description of the incident.
- The addresses are then geocoded against a custom database of census data. I'm going to put up an article soon on how this is done.
The extraction piece still needs some work, so there are a few gaps. Most of it is due to the inconsistency in which the police department labels and incidents and how the addresses are layed out.
The geocoding isn't perfect. If its within 100 yards I'm happy. More on this later.
Its still got a ways to go, but I think at this point its good enough to put out there.