Archive for May 2008

Australian Weather Hacking Project

Ever since I wrote my getauweather program in early 2006 I have been meaning to put it to some sort of good use and actually do something with it. Just a little overdue, I’ve spent the last week working to this end and today am ready to announce my Australian weather data hacking project. At this page I hope to make available progressively more complicated and interesting applications of the huge amount of data that the Bureau of Meteorology make available. Perhaps more importantly, I am going to make every effort to make that data available to the community at large so that people who aren’t me can do cool stuff with it as well.

At the moment, there is not an awful lot there. Every hour, a cron job runs a little Python script which uses getauweather (not the version you can currently download from my software page, a newer, better version that I’ll release in a few days when I am confident that it is working) to grab the latest data from all the weather stations and then:

  1. Reformats the data into CSV, XML and YAML, which anybody can then use in their own applications, for whatever purpose. The .csv file comes in at just under 100 kb and the two markup language files are in the area of 250 kb.
  2. Updates this Google map of weather stations.  This little visualisation really is my first use of either Javascript or the Google Maps API, so please forgive the fact that it really does suck. The page is a whopping 160 kb in size and, on most computers
    I’ve tried it on, Firefox will complain about how long it runs – just tell it to keep going and you’ll see results soon enough. I plan to use this map as a means to develop my Javascript skills, so hopefully it won’t suck for too much longer and will instead be powered by lightweight AJAX goodness. Of course, if you are already a Javascript wizard feel free to upstage me by making the coolest thing you can using those three freely available files above.

The next logical step for this project is to start logging these hourly weather results into a database which can be queried via a HTTP API. This will give me an excuse to finally try out CherryPy. Once such a database is up, AJAX magic will allow all sorts of awesome applications, and if people can download monthly dumps of the db then they can perform all sorts of fancy statistical analysis and the like. It should be good. I will note in passing that this undertaking is probably legally shaky for now.
Unlike the situation in the US, which has the eminently sensible rule that the federal government is not permitted to claim copyright on anything they produce, the Commonwealth of Australia feels quite entitled to copyright the BoM
observation data
. Apparently they are very easy going about granting licenses to repackage and redistribute the freely available stuff, so I’ll try to go down that route soon and, I suppose, pull the project if the BoM ends up objecting.
Personally, I am not entirely convinced that their claim of copyright can be legitimate. My understanding is that copyright law allows for protecting a particular expression of an idea, not the idea itself, which can only be protected by a patent. It’s not clear to me how this distinction applies to raw numbers, like weather station data, which can only possibly be expressed in one sensible possible way. At any rate, I don’t expect any trouble to show up on this front.

On a closing note, returning to the subject of improving my Javascript skills: In a previous entry I posted a link to a level of Super Mario World implemented in 14 kb of Javascript. I learned the other day that the same guy has gone ahead and started working on implementing Super Mario Kart the same way, and has previously done Wolfenstein 3D! The blog author, one Jacob Seidelin, is in fact quite the Javascript hacker, in the most traditional sense of the word – coding for fun with no specific goal
or direction, simply a desire to push boundaries and overcome limitations, which he is certainly doing. I’ll have to keep an eye on his work.

A little known awesome Python module

Via my constant companion, the programming reddit, the other day I came across a blog post by Doug Hellmann about a Python module named cmd, which contains one public class you can subclass to really easily make awesome command-driven programs in Python.

Reading about cmd, I thought it looked really awesome, couldn’t wait to use it, and was incredibly jealous that this Hellmann guy had thought of writing such a useful module before I had. It would have been a fairly easy but really useful and respectable bit of free software to have produced.

Then, reading the first comment on the post, I realised that I had misunderstood entirely what this was. cmd wasn’t a third party module that Hellmann had written and released. It was a part of the Python standard library that Hellmann was simply describing. This thing, this awesome thing, has been in Python since version 1.4, which was released last century, and before I had done any programming in anything other than Commodore BASIC 2.0, before I had any idea what Unix was. As far as I
personally am concerned it may as well have been the first thing Guido ever wrote.

In all my years of Python usage, I’ve never heard of cmd or seen it used!  Heck, I own a dead-tree copy of the O’Reilly book Python Standard Library and flip through it occasionally while waiting for the kettle to boil, or whatever, and I’ve never seen it in there. I see now that of course it is in there, but I never came across it in my random walks through that book, nor in any of the other Python books or blogs I have read. I am really astonished. This is something that should be much better publicised.

On an unrelated note, the anti-spam questions I discussed setting up in my last entry have completely eliminated the Russian spam problem! Rock on.

Spam, spam, spam, spam…

As you might have noticed, as of a couple of days ago this blog started getting hit pretty heavily by comment spam, composed mostly of links to Russian pornography sites. As of this afternoon, I think I have deleted all of the offending comments. There is a small possibility that I nuked a legitimate comment or two in doing so, but given the currently low frequency of real comments I’m getting, I doubt it. Still, if you left a comment in the last three days you may want to check that it’s still there. If it’s not, email me and I should be able to resurrect it from the notification email.

In an attempt to stop this from happening again, I’ve installed Menno Smits’ “spamquestion” plugin, which relies on Steven Armstrong’s “session” plugin. You now have to give a simple, one word answer to a question like “What is the opposite of hot?” to leave a comment here. The question is randomly selected from a set of about 10. This sort of spam protection isn’t as strong as captchas, because it’s a fairly trivial matter for a spambot to collect all 10 of the questions, have the answers provided by a person, and then spam as usual. However, it’s perfectly adequate to protect against spam which isn’t being individually targeted against your one site (the spam I was getting came from a range of IP addresses, so I’m going to assume it was the work of a botnet) and has the advantage of working in text-based browsers and not disadvantaging visually impaired people. Let’s hope it works here.

Based on my preliminary fiddlings with these plugins, it looks like there is little in the way of graceful handling of incorrect answers to the spam question – the form just gets reloaded with none of your input preserved and no explanatory message. This is obviously unacceptable and I might get around to fixing it myself sometime soon. For now, just be careful!