Posts tagged ‘getauweather’

Weather hacking updates

My move went smoothly enough. Total downtime for all the maurits.id.au services was a lot higher than I had anticipated, slightly over 24 hours. It turns out that I stupidly still had some IPNAT rules active from a very long time ago which weren’t having any effect on my old home network (as the IP address ranges didn’t match) but which did have an effect on the (pre-existing) network at my new home. It took me a while to figure this out and until I did no incoming connections got through as my router was forwarding to packets to a non-existent host. Oops.

Anyway, it’s time for an update on my weather hacking project. For about the last two months I have been slowly tweaking the bugs and inefficiencies out of some new code which is run as part of the hourly cron job which updates the .csv, .xml and .yml files that are about the full extent of the project, currently. This new code dumps the data into a PostgreSQL database, so that it persists beyond the lifetime of the particular issue of the BoM page that it was scraped from. In this way I have managed to accumulate an impressive 892294 records thus far! The data comes from 678 stations, so there is an average of 1316 records per station, or one record per hour for about the last 55 days.

The ultimate long-term plan for this database (which I’ll probably start making dumps of available on a weekly basis or something) is to create a JSON interface to it, which will enable people with much better Javascript-fu than I to build nifty web applications without having to download the entire huge database. I have a prototype for this using CherryPy in the works, but it will probably be quite a while before I have come up with a complete API that I am happy with and which runs quickly and reliably. Perhaps I should polish up the code for the database cron job and release it so someone
else can beat me to it?

A much more immediate goal is to use the database to get get dynamic generation of RSS and Atom feeds for weather data working. This will, of course, be done using feedformatter, which is turning out to be a very useful module for my projects indeed. I have feeds being produced by the hourly cron job already, but have not published them just yet on account of a detail. In order for RSS or Atom feeds to be valid (and hence to be recognised by the less-lenient readers out there) each item needs a URL associated with it. It seems wrong to have the items link anywhere other than to the BoM (to a station-relevant page such as this one for Adelaide). This is easy enough to do but requires putting a URL into each row of the station table of the database. This is the kind of thing one definitely wants to automate rather than doing by hand, but this is not entirely straightforward. The stations are identified in the database, presently, by an abbreviated name as used in the scraped page, e.g. “HindmarshI” for “Hindmarsh Island”. The only pages I can find that are convenient for scraping these station-relevant URLs identifies stations by their full names. Thus, automating the process of associating URLs with stations requires the ability to automatically map from full names to abbreviated names. I’ve written a heuristic function for doing this which makes expansions like “Is” to “Island” and “Pt” to “Port” and looks for matches in the list of full names, but so far it’s only got about a 50% success rate. Logically there exists a point where it makes more sense to just finish it by hand that spend time trying to finish off the automation, but I am the kind of person who will keep banging my head against incomplete code well beyond that point. Hopefully it won’t take too long.

Anyway, the project is progressing well. I still haven’t contacted the Bureau regarding copyright licensing. I definitely will before publishing the RSS feeds, or perhaps immediately after so that whomever at the Bureau has to
consider the matter can actually see them.

It’s kind of a shame that it looks like the more interesting parts of this project will be ready just in time for the soul-withering monotony of the heart of an Australian summer…

Australian Weather Hacking Project

Ever since I wrote my getauweather program in early 2006 I have been meaning to put it to some sort of good use and actually do something with it. Just a little overdue, I’ve spent the last week working to this end and today am ready to announce my Australian weather data hacking project. At this page I hope to make available progressively more complicated and interesting applications of the huge amount of data that the Bureau of Meteorology make available. Perhaps more importantly, I am going to make every effort to make that data available to the community at large so that people who aren’t me can do cool stuff with it as well.

At the moment, there is not an awful lot there. Every hour, a cron job runs a little Python script which uses getauweather (not the version you can currently download from my software page, a newer, better version that I’ll release in a few days when I am confident that it is working) to grab the latest data from all the weather stations and then:

  1. Reformats the data into CSV, XML and YAML, which anybody can then use in their own applications, for whatever purpose. The .csv file comes in at just under 100 kb and the two markup language files are in the area of 250 kb.
  2. Updates this Google map of weather stations.  This little visualisation really is my first use of either Javascript or the Google Maps API, so please forgive the fact that it really does suck. The page is a whopping 160 kb in size and, on most computers
    I’ve tried it on, Firefox will complain about how long it runs – just tell it to keep going and you’ll see results soon enough. I plan to use this map as a means to develop my Javascript skills, so hopefully it won’t suck for too much longer and will instead be powered by lightweight AJAX goodness. Of course, if you are already a Javascript wizard feel free to upstage me by making the coolest thing you can using those three freely available files above.

The next logical step for this project is to start logging these hourly weather results into a database which can be queried via a HTTP API. This will give me an excuse to finally try out CherryPy. Once such a database is up, AJAX magic will allow all sorts of awesome applications, and if people can download monthly dumps of the db then they can perform all sorts of fancy statistical analysis and the like. It should be good. I will note in passing that this undertaking is probably legally shaky for now.
Unlike the situation in the US, which has the eminently sensible rule that the federal government is not permitted to claim copyright on anything they produce, the Commonwealth of Australia feels quite entitled to copyright the BoM
observation data
. Apparently they are very easy going about granting licenses to repackage and redistribute the freely available stuff, so I’ll try to go down that route soon and, I suppose, pull the project if the BoM ends up objecting.
Personally, I am not entirely convinced that their claim of copyright can be legitimate. My understanding is that copyright law allows for protecting a particular expression of an idea, not the idea itself, which can only be protected by a patent. It’s not clear to me how this distinction applies to raw numbers, like weather station data, which can only possibly be expressed in one sensible possible way. At any rate, I don’t expect any trouble to show up on this front.

On a closing note, returning to the subject of improving my Javascript skills: In a previous entry I posted a link to a level of Super Mario World implemented in 14 kb of Javascript. I learned the other day that the same guy has gone ahead and started working on implementing Super Mario Kart the same way, and has previously done Wolfenstein 3D! The blog author, one Jacob Seidelin, is in fact quite the Javascript hacker, in the most traditional sense of the word – coding for fun with no specific goal
or direction, simply a desire to push boundaries and overcome limitations, which he is certainly doing. I’ll have to keep an eye on his work.

Two New Software Releases

Just a quick entry to announce two new (well, almost new) software releases.

Firstly, I have finally brought my (X)HTML link testing script TestLinks up to a high enough standard to make it worth releasing. You can read all about it and download it from here. It’s just a simple script designed to be run from cron and let you know if any links on your sites go stale. Simple, but useful. It’s written in Python, of course.

Secondly, I realised that although I wrote the GetAuWeather page and linked to it from my software page way back in July, I never actually put a link to the code in there. This has been fixed. GetAuWeather is a simple Python module for downloading Australian weather observations from the Bureau of Meteorology’s website and parsing it up into sensible Pythonic data structures. At the moment it’s just a function for doing that. One day I will use this to make something useful, probably by writing a daemon to constantly throw new weather data into an SQLite database (taking advantage of the SQLite support new to Python 2.5) and then making a fancy web interface to this database.