Archive for the ‘Uncategorized’ Category.

Experimenting with flickr and GoodReads

As a part of migrating to an external host and also experimenting with an approach of choosing and using software and services for the sake of convenience rather than idealism, etc., I’ve been experimenting with using a few 3rd party social media applications, and thought I’d describe my impressions of them.

First up is photo management.  This is something that was done extremely poorly on my old self-hosted site, I had to admit.  I originally had a series of scripts for producing static galleries, pulling in metadata (like descriptions and tags, etc.) from flat text files, but it didn’t work particularly well because I simply didn’t have the energy to do things like make the gallery look nice (not that I didn’t care, but CSS, despite being a fantastic idea in principle, is such a poorly implemented and counterintuitive hack-fest that making a nice image gallery without using tables is far more difficult than it should be) or provide the metadata.  After that I experimented briefly with using a third party PHP-based application which didn’t require the use of a database, but it was a fairly half-hearted effort and I never got a significant number of pictures into it.

The two big “cloud based” photo management apps are, of course, Flickr and Google’s Picasa.  Flickr has by far the largest userbase, and looking at a few “Flickr vs Picasa” articles turned up by a quick Google suggested that Flickr is ahead in a lot of important ways, so I decided to give it a try.  When I first tried to sign up and realised I would be forced to setup a Yahoo account, complete with @yahoo.com email address, I quickly lost my stomach for the idea: I have no interest in using any other Yahoo services and I don’t want yet another unused and unwanted email address cluttering up my online presence.  After further reflection, I decided this was a little hypocritical, since I already use a number of Google services which require a Google account: although, to be fair, Google makes this process a lot easier and much more lightweight.  Anyway, I bit the bullet and got myself an account.

My experience with Flickr so far has been entirely adequate, but it’s not exactly something I’ve fallen in love with.  The interface could use a lot of work, I don’t find it particularly straightforward: or rather, the stuff that everybody wants to do all of the time is not done using more obvious or prominent controls than the stuff that few people want to do some of the time.  While the idea of a Photostream (a sequence of photos arranged in the order you upload them) is fine (indeed, it makes sense from the point of view of keeping up to date with the photos of friends),  I don’t think.  I’m also really irritated that the ability to define hierarchical sets (“sets” are coherent collections of photos in Flickr, things that might be called “albums” elsewhere) is restricted to those who hold a “Pro account”, which costs US$25 per year.  I have no objection to the concept of a paid account status, I realise Flickr have costs to cover, but it seems like the features that you need to pay to get should be limited to things which are in some sense “special” or “excessive”: increasing the number of photos you can upload per month, or the maximum size of photos, or even the number of sets you can define are sensible candidates.  Such a basic and fundamental organisational tool as the hierarchical arrangement of sets should really be a freebie: it’s something just about everybody is going to want.  Anyway, I’ll be sticking with Flickr for the forseeable future.

The second thing I’ve been looking at is keeping a record of which books I read and when.  This is something I never even tried to implement back when I was self-hosting everything, although I’ve had an account at LibraryThing for some time (but rarely used it).  I really like the idea of this kind of thing, but I never kept up to date with it much, mostly because the interface at LibraryThing is so terrible.  For one thing, I found it genuinely hard to remember where to go to edit my metadata about the book (such as when I read it or what tags I’d like to give it), as opposed to editing site-wide metadata about the book, such as who wrote it.  Worse, LibraryThing treats every separate edition of a book as a distinct entity.  There is no way to say “I have read William Gibson’s Neuromancer“.  You are forced by the system to proclaim “I have read the edition of Neuromancer published in 1984 by Ace Books”.  By all means, the ability to express that information if one wants to should be present, but actually forcing it is ridiculous, especially since if your particular edition is not in the system you are forced to enter incorrect data by choosing another edition.  The refusal to combine all editions into a single Platonic concept of the book also really interferes with useful collaborative filtering: you can’t learn things about people who also read Neuromancer, only things about people who also read your edition of Neuromancer, which is not a distinction I can imagine many people are interested in.

Frustrated with the above shortcomings, I’ve checked out GoodReads, an application similar in spirit to LibraryThing but which seems to lack these shortcomings.  It has Platonic concepts of books, but also the ability to specify editions if one is so interested.  I’ve loaded a small, initial subset of everything I can remember reading (basically favourites and stuff I’ve read recently) into my account, and the interface seems clean and intuitive.  It remains to be seen how long I can keep up the habit of chronicling what I read, but it’s nice to have a tool for doing so which is straightforward enough that I can’t really blame it for my failures to do so.

Old posts imported

It turned out that getting my old Yomiko blog posts imported into my new Wordpress setup was not too difficult after all.  Wordpress allows you to import posts from a variety of other blogging platforms whose export formats it understands.  Obviously Yomiko, with an expected global userbase of 1 (and now 0), isn’t one of them, but handily enough, Wordpress speaks a format called “Wordpress Extended RSS”, which is designed to let you import posts from earlier versions of Wordpress.  By making an export of my one new blog entry in that format and looking at the file, it wasn’t too hard to write a script which would convert my old entries into the same format for easy import.  It took a few tries to get this right (frustratingly, there is no preview facility for imported entries, so if you import 70 of them and something goes wrong you’ve no choice but to delete all 70 of them and start again), but for now it seems like most of the important details – dates and times, tags and comments – have come through unscathed.  Some entries seem to have been wrapped to 80 characters wide using linefeeds for reasons I don’t understand, but that’s not a huge problem and one I can fix by hand without too much trouble.

Making a few changes…

So, anybody who has been visiting this site for the last month or so is likely to have been a little confused with regards to just what is going on.   I’ve made a number of changes to my online infrastructure, and it’s taken me a little while to stabilise on a setup that I think I’m comfortable maintaining.  But I’ve done so now, so hopefully from this point on you’ll see a steady import of the old content to this new site, although it may take a while.

The starting point for all of this was my having to move house a little more than a fortnight ago.  I decided that, after several years of it, I was starting to get sick and tired of the hassle involved in self-hosting (especially when moving time comes), and in fact I was starting to get sick and tired of doing everything computer-related “the hard way” for the sake of things like control, ideology and technological machismo (more on this in later posts).  So before the move, I got a friend who owns a hosting company to set me up with some webspace and began the process of migrating everything across.

I only have FTP, not shell access to my new host, which necessitated a lot of significant changes to the way everything worked.  Previously my website was powered by a self-written collection of hackish Python scripts to translate text files written in Markdown format into HTML files using Cheetah templates, and the blog was powered by my rudimentary own blogging engine Yomiko, a CherryPy application.  Not only was getting all of this set up and operating on a new host without shell access impossible, I have to admit that the inconvenience and fragility of the system probably prevented me from writing anywhere near as much as I would if were more simple.  So I ended up leaping from one extreme to the other and using, like everybody else, Wordpress for my new site.  I’ve avoided using Wordpress in the past primarily out of a distaste for PHP and because the fact that it strictly requires the use of MySQL strikes me as extremely poor design (why not write everything in terms of a database abstraction layer so that people can choose to use PostgreSQL or SQLite if they prefer them?), but I have to admit that if you can get past all of that its a very quick and easy way to throw up a website.  I don’t think it will extend to handling the “Writing” part of my website, at least not very conveniently, so I may look at using a Wiki application for that, even though it will break visual consistency.

The biggest question mark still hovering over this whole setup is how I will go about porting my old blog posts into Wordpress.  I would naturally very much like to avoid having to do this by manually copying, pasting and backdating everything, but without the ability to run scripts on the new host I’m not entirely sure how to go about this.

That’s all for now.  Hopefully all this decadent point-and-click WYSIWYG AJAX goodness will encourage me to start writing more frequently again.

Algorithm Kid

Fairly recently I started exchanging blog comments and Identi.ca notices with an English geek Steve Clark, who’s interested in the social semantic web, Python, cryptography, etc. Steve recently wrote a blog entry entitled “Programming languages I have known“, which I found interesting, mostly because Steve started programming in an era of computing that I was born just late enough to miss out on, but have always felt a sort of strange, artificial nostalgia for, like I should have been a part of it. Acoustic couplers, and all that.

I liked the idea of a blog entry that was autobiographical but still mainly technical in nature, and thought about writing my own version of “Programming languages I have known”, but I realised that this would, in fact, be pretty boring. I don’t know that many programming languages, the ones I do know are entirely mainstream and historically-uninteresting, and I taught myself comparatively late (I think I was 15 when I bought “Sam’s Teach Yourself C in 24 Hours”, which I just now found free online by Googling it). Then I thought about just writing something not on languages but on the various computers I’d used throughout my life, but that would be pretty standard and boring, too: My primary school had two BBC Micros, my household, two relatives and lots of my neighbourhood friends had Commodore 64s (one of these machines had a magnetic tape drive, which took cassettes of the same size as the old music cassettes, everybody else had 5.25 inch floppy drives), and then came the rise of the IBM PC.

Instead, I decided to write about two incidents I can remember from my primary school years in which I displayed a way of thinking which was in some sense algorithmic in nature – thinking like a computer programmer years before I wrote a single line of code, or at least wrote one that I actually understood.

The first of these concerns a game that one of my primary school classes would occasionally play. One student chooses a number between 1 and 100 and keeps it a secret. The rest of the class take turns attempting to guess the number, to which the first student must respond with either “higher” or “lower”. The student who eventually guesses the correct number then gets to choose the next one, and the game repeats. In retrospect, it’s amazing the kind of mindless and repetitive stuff that young kids will find fun. Anyway, I remember quite clearly the time we were playing this game when I realised that there was one clearly optimal strategy. I think I was around 11 or 12 years old at the time. The strategy I had developed was this: the first student to take a guess should guess 50. Depending on whether the response was “higher” or “lower”, the next student should guess 25 or 75, and so on, with each student guessing the number in the middle of the not-yet-disqualified range of possible answers. If you have no prior knowledge about what the secret number is likely to be, this strategy is the best you can do, in the sense that it minimises the average number of guesses required to find the answer. A very similar method can be used to find the zeroes of a scalar function, though of course I had no idea about anything like that at the time (note that the method is not at all optimal for the zero finding problem). I remember that I figured this out by visualising the number line from 1 to 100 and realising that each guess ruled out either all the numbers to the left or the right of it, and observing that by guessing in the middle of the remaining range you were guaranteed to rule out half of it, whereas by guessing anywhere else there was always a chance of ruling out less than half of it. I became quite excited when I realised this and explained it to the rest of the class, assuming that they would concur and cooperate with the strategy. Ever future instance of the game would be over in less than 7 guesses (the logarithm of 100 to base 2 is 6.64)! However, nobody else seemed to believe me and in fact the class as a whole was quite hostile to the idea of any kind of systematic approach to the game. Looking back, I guess the algorithmic approach probably ruined whatever element of “fun” we must have seen in the game, but at the time I was baffled by this resistance.

The second incident happened earlier, I must have been between 8 and 10 if I correctly remember the way year levels were distributed amongst classrooms at my primary school. However, it’s also less impressive of an insight. Around this time my classroom went through something of a football craze (note that Australian football is not the same as European football, which we call “soccer”, nor American football, which we call “rugby”. Actually, I’m not sure rugby and American football are the same, but who cares?). I never really got into this (I’ve never really got into following any kind of sports), but I remember being baffled by how little any of the other kids thought about the outcomes of football matches (I’m talking about matches in the national league, here, not lunch-time games) as being predictable from data. During this craze, a lot of kids collected football cards, and each card would have on the back all sorts of statistics about the corresponding player, things like the average number of points scored or marks taken per match. Kids would have huge folders full of these cards, each one slotted into a position in a plastic sleeve. They were veritable hand-held databases. And yet, given these databases, kids were making their decisions about which team to cheer for based on things like team loyalty or who was on a “winning streak” or simple gut instinct. I remember it being obvious to me at the time that the sensible thing to do was to assign some number of points to each team on the basis of the available statistics and to support whichever team had the most points. I never worked out an explicit scheme for doing this – presumably a team would get something like one point for each game it had won so far in the season, and one point for each average goal scored by each player who was participating in the match. I do remember realising it would be necessary to take off points for each participating player who had an injury. If I ever explained these ideas to anybody, I don’t remember it. No doubt they would have gone down as well as the method of bisection did for guessing secret numbers, anyway. No doubt my plans for a point assignment system were rather simplistic, perhaps too simplistic to have worked very well if I’d put them into practice (I never did, because at that age I didn’t have the computer skills to program it, and doing it by hand would have been tedious, especially since I didn’t really care about the football anyway), but the point stands that I was aware by this age of the principle that numerical data about past events could be used to forecast future events. I have no idea where I got this idea, whether it was just intuition or if I generalised from an actual observation.

Although these are appropriate childhood stories for someone like me to be able to tell now, I don’t actually understand why it is that I can remember them as clearly as I can. They must surely have been fairly inconsequential at the time. Can I remember these things so well because they were part of the birthing process of my now excessively algorithmic way of thinking, or is my distribution of childhood memories unbiased with regard to these sorts of things, and two of those memories just happened by chance to be like this because I thought that way a lot? Memory’s a funny thing.

Virtualise me

Up until relatively recently, my computer arrangement had been as follows: I had two desktop computers, sharing a monitor, keyboard and mouse using a KVM switch. One of the machines was an old-ish IBM Thinkpad T42 laptop, which I put in a port-replicating docking station so I could easily take work out on the road with me. The Thinkpad ran Arch Linux and it’s what I used to do almost everything – email, the web, development, research, music and videos. I liked the Thinkpad for the convenience of its portability and also because it didn’t use a lot of power, and the various ACPI features like CPU throttling and suspending to RAM worked well under Linux, so that it drew practically no power when I wasn’t actually using it. The other computer is a big, clunky tower desktop running Windows XP which was off most of the time, only switched on when I really needed something I could only do on Windows. Mostly this was to print (I can’t get CUPS to play nice with our Canon MX850) or to play games. I don’t actually do much PC gaming these days, but I did play WoW for a few months recently until it became boring. Wizards of the Coast now have a free Dungeons and Dragons MMORPG game that I have my eye on, but I’m trying to put off trying it until I pass an upcoming deadline for a journal article submission.

Anyway, a few months ago I accidentally destroyed my Thinkpad’s motherboard in an incident that I guess I should write up as a cautionary tale about self-repair (probably more so against being impatient, but that’s another story). So I needed a new Linux desktop machine until such time as I could find an affordable replacement Thinkpad mobo. I have a collection of 3 HP “e-Vectra” machines, which I bought from eBay years ago when I was very interested in cluster computing. I really like these machines because they’re easy to find cheap, are very small (less than 30cm/1 foot in either of the long dimensions and not 10cm wide), they stack well (check out the photo in this forum post!) and don’t draw a lot of power (there’s no internal PSU, just a little brick transformer at the end of the cord, like a laptop). They’re not the most powerful machines in the world, with 700 MHz Pentium III’s and 128 MB of RAM, but I maintain that the perception of those sorts of stats as “uselessly low” by most people is largely an illusion generated by bloated mainstream software (this is a problem in both the Windows world and the FOSS world) and by a hardware industry driven almost purely by the game industry. I upgraded the RAM in one of them to 256 MB and went about seeing how it held up as a modern desktop. It wasn’t terrible, and if it weren’t for today’s RAM-intensive web, it probably would have sufficed, but ultimately Firefox froze up just way too often so I had to look for another solution.

The big Windows XP machine actually has two identical hard drives in a RAID array, so I split them apart and installed Linux on the second one, keeping XP on the first. This machine is considerably more modern (2.8 GHz Athlon 64 processor, 2 GB RAM), so no performance issues at all, but having my Linux system and Windows system on the same machine was kind of a hassle. Anytime I wanted to do play a game or print something, I’d need to carefully close down everything I had running in Linux and reboot, only to have to carefully open all that stuff back up again when I was finished. Furthermore, I couldn’t manage to get suspend to RAM working on that machine under Linux at all, so the machine ended up being left on 24/7. Because this machine was originally built for gaming by my brother-in-law, it’s horribly power hungry(although probably not half as bad as the machine he built to replace it, leading to my inheriting this one) so I really didn’t like this.

For the last few days I’ve been experimenting with a novel solution to this situation, and I think that I’ll stick with it because it’s worked out better than I ever expected. I’ve installed Sun’s VirtualBox system on the Windows XP install of the machine and installed Linux inside of a virtual machine. This is pretty much the first time I’ve ever experimented with any kind of virtualisation technology seriously, and I have to say I’m incredibly impressed. I never imagined that the performance of a virtual machine would be good enough to actually use it as a desktop, but I’m writing this right now from inside the virtual Linux install, with mail client and browser running, and music playing smoothly. The performance is just fine. Yes, it’s perceptibly slower than running Linux natively, but only just, and it’s 100% endurable. With the virutal machine running in full-screen mode, there’s pretty much nothing at all to give away the fact that Windows is talking to the hardware underneath it, all I can see is my ion3 X11 desktop. However, when the need arises, I can just minimise the machine and find myself at an XP desktop, ready to play games, to print something out, or to suspend the machine to RAM (which of course works perfectly under Windows). It really is like having immediate access to the very best of both worlds, the mainstream software and superior hardware support of Windows and the, well, everything else of Linux. As an added bonus, I can backup my entire Linux system as a single file, and even migrate a copy of it to any other Windows machine running VirtualBox! Seriously cool stuff.

The only real drawback I’ve found is that, because the virtual machine’s network interface is implemented using NAT behind the host machine’s interface, a few networky things don’t work out of the box. The only thing that hasn’t worked so far is NFS, as described in this forum post. This was easily solved by using sshfs, which just worked. Also I’ll have to set up port forwarding if I want to be able to ssh into the virtual box from anywhere else, but that’s no big thing.

I’m going to wait a few weeks to make sure there really aren’t any hidden problems with this approach, but I’m thinking I’ll probably stick to this arrangement and “re-RAID-ify” the two drives with the XP/virtual Linux image. Viva la virtualisation!

Chrooting Python in OpenBSD 4.5

Just a quick technical entry today, on the off chance that this helps somebody, somewhere.

For reason’s I’ll go into another time, I’ve recently been trying to chroot a Python installation into the directory /srv/www on an OpenBSD 4.5 machine. I took the standard approach to this, which is to examine the output of ldd /usr/local/bin/python2.5:

/usr/local/bin/python2.5:
Start End Type Open Ref GrpRef Name
1c000000 3c004000 exe 1 0 0 /usr/local/bin/python2.5
05432000 2547c000 rlib 0 1 0 /usr/local/lib/libpython2.5.so.1.0
05dc3000 25dc7000 rlib 0 1 0 /usr/lib/libutil.so.11.0
0df1e000 2df44000 rlib 0 1 0 /usr/lib/libstdc++.so.47.0
0d543000 2d54d000 rlib 0 1 0 /usr/lib/libm.so.5.0
02a7c000 22a85000 rlib 0 1 0 /usr/lib/libpthread.so.11.1
06a56000 26a8f000 rlib 0 1 0 /usr/lib/libc.so.50.1
08db8000 08db8000 rtld 0 1 0 /usr/libexec/ld.so

and copy each of these files over to the corresponding location in the chroot directory (i.e. stuff from /usr/lib/ goes in /srv/www/usr/lib). But after doing this, testing the setup with chroot /srv/www python2.5 yielded the error: /usr/bin/local/python2.5: can't load library 'libpython2.5.so.1.0'.

I went near mad trying to figure out why this wasn’t working. The libpython file was clearly in the correct location. A PHP installation that I had chrooted into the very same place using the very same procedure worked flawlessly. There seemed to be no scope for rational explanation of why this wasn’t working.

In the end, I managed to solve this problem by copying the file /var/run/ld.so.hints into the corresponding location in the chroot directory. Everything worked perfectly after that. I don’t profess to have any idea why you need to do this (I discovered that it worked after much trial and error, based on random permutations of stuff found on the web), but you do. Hopefully this post saves some other poor hacker from wasting a few frustrated hours.

I note that the ld.so.hints file exists in the same location on recent NetBSD releases, so I assume this advice ports over to NetBSD as well. However, there is no such file in my install of Arch Linux, so things are probably different in Linux land. Linux distros do seem to have a file /etc/ld.so.conf which serves a similar purposes – it was actually reading about this that gave me the idea to look for something similar in my situation.

Video games as a special case in copyright ethics

A brief thought on the inapplicability of a common argument against file sharing to the case of old video games:

The most common argument against file sharing is that artists lose money when people download rather than purchase music, films, etc. Now, for a variety of reasons this is actually not a very compelling argument to anybody who really thinks about it. It’s full of holes. Chances are very good that piracy actually makes artists more money – but that’s not the subject of this post. For the purposes of this post, all that really matters is that this bad argument at least has some sort of superficial, surface plausibility. It’s not completely outlandish. When it comes to most music and films – and certainly to anything remotely popular – the option exists of going into a physical store and buying a physical plastic disc for money. Through the long chain of middlemen, at least some of that money makes its way into the pocket of the people who worked hard to produce that piece of work. When you don’t buy the work this way, but download it from a file sharing network, then this chain of events doesn’t happen, and so in some loose sense, the artists “lose” money. You can see why people might be sympathetic to this point of view the first time they think about it.

Even if this argument was absolutely bulletproof, it’s interesting to note that it absolutely does not apply for video games beyond a certain age, because the video game market works very differently to the music and film market, in that
products age extremely rapidly. If I felt like buying some music or films which were produced in the 70s or 80s, I could easily do that. There’s still a big market for those products, and so people keep producing them to keep up
with demand, and they get released on new forms of media when they come out. However, if I wanted to play a video game produced even in the 90s, I’m probably out of luck. It’s just not possible to buy a game that old off the
shelf. No store on Earth stocks them.

I’d really like to play Final Fantasy VII some day, since it is widely regarded as a classic of its genre, and not having played it makes me feel like I’m missing out on an important part of being a geek, kind of like a fantasy fiction fan who’s never read Lord of the Rings. I have two options for acquiring FFVII. I could buy a second hand copy from someone, which is the perfectly legitimate, law-abiding citizen option, or I could download it from a file sharing network, which would be illegal, and make me a “pirate”.

The thing is, there’s no solid ethical argument for why anyone should buy a second hand copy. If you do so, none of the money you spend (and this might be a lot of money – because the game is so famous and people are reluctant to part with it, copies can sometimes go for over $100 on eBay, more than the cost of a brand new modern game) goes to the people who helped to make that game, to compensate them for their time and creativity. Every last cent of it goes to the person you’re buying it from, to compensate them for giving up a physical possession. By downloading the game illegally, no money changes hands at all, but then nobody is giving up any physical possessions either. There’s
absolutely no difference between the two options from the point of view of the artists. They certainly don’t lose money when you download it. There’s simply no option today by which you can acquire the game and compensate the creators. The nearest you could come is to pirate the game and then send a cheque for a tiny amount to every writer, programmer, animator, composer, etc. who worked on the game, which is obviously not feasible for a bunch of reasons. Unless you’re a devoted collector who actually wants the physical product to put on a shelf, the most sensible thing to do is to just download it and not feel one iota of guilt.

This is a good example of where copyright law does far less to protect artists than it does to hurt consumers, by making the difficulty of legally acquiring a game directly proportional to how popular it is. Imagine how much less popular the Beatles would be amongst young people today if the only legal way to get one of their albums was to buy it second hand for $100!

Python vs C for performance

Via Reddit today I came across a fairly decent article on Python optimization tips and issues, which comes across fairly heavily in favour of the idea that by being careful and knowing what you’re doing, you can typically make Python implementations of
numerical algorithms fast enough for practical purposes, saving you the massive headaches associated with development in C.

This is a fairly relevant topic for me. A huge part of my PhD research is writing and playing with computational models of language acquisition. These models usually use Bayesian inference as a model of an “ideal learner” (doing this is fairly trendy in modern computational cognitive science, and not without good reason). When it comes to doing numerical Bayesian inference, a class of techniques known as Markov Chain Monte Carlo set the standard, and MCMC computations are the bread and butter of my programming work. Without going into too much detail, MCMC computations are highly iterative – you basically just do the same few steps over and over as fast as you can until your program converges on your answer.

When I first started writing MCMC models for my research, I did them in Python. I knew that my C skills weren’t great, that I’d be able to program a lot faster in Python, and that there were enough resources on the net for high performance Python computing that I’d be able to get my programs fast enough.  But after investing a lot of time trying to get my first Python MCMC program running fast – using things like numpy’s arrays and psyco – I still wasn’t happy with how slow things were going. I knuckled down and rewrote my model in C, and it absolutely blew the pants off my Python version. Of course I expected it to be faster, but it was much faster than I ever expected it to be. From then on in, I’ve written all of my MCMC stuff in C.  Since then I’ve become aware of PyMC, a Python library geared specifically toward MCMC. So obviously someone out there is able to make Python work for MCMC at a decent speed. This makes me wonder if maybe I really was doing something extremely wrong the whole time I was using Python for numerical work. Maybe it’s time I really dedicate some time to learning how to make Python faster for this kind of thing, to save myself time and effort in the future.

Even if it turns out that suffering through C implementations for the past 18 months has been something of a waste of time, to be honest I wouldn’t really regret it. Relying on C for my research has taught me more about it than I’d ever have learned otherwise. I am now comfortable with using malloc, calloc and free, I’ve done some multithreading work with the pthread API and I’ve learned how to use gdb and gprof.  I feel like I actually deserve to be able to say that I “know C”, as opposed to before when I knew the syntax but couldn’t really work effectively in C. Of course, I am still a long way from being any sort of guru, but I feel like I’ve taken important steps. I’ll be much more confident the next time I have to read and understand some C code.  A lot of people probably feel that in this day and age of Python and Ruby and Lua (and in less “webby” parts of the world, Java and C#) that competence in C is an obsolete skill. I think there’s a degree of truth in that, and certainly C shouldn’t be used for new projects without a compelling argument in favour of it (and a simple “it’s faster”, is not a compelling argument!). But at the same time I think that mastery of C is still an important rite of passage for serious programmers, and it feels good to have taken a number of extra steps in that direction.

But if I can work in Python from now on, I’m not going to miss the segaults.

On the difficulty of environmental micro optimisation

When it comes to things that a person can do to help reduce the negative environmental impact of their life, I think one of possible several sensible distinctions to draw is one between what I call “environmental macro optimisations” and “environmental micro optimisations”.

Macro optimisations are big, expensive, obvious changes that have an incontrovertibly positive impact on the environment. Powering your house using solar panels on the roof, installing rainwater tanks and driving an electric car are all good examples of macro optimisations. Micro optimisations are much smaller, cheaper, simpler changes that don’t have quite so much of an effect: taking a re-usable mug with you to the coffee shop instead of using a plastic disposable, for example.

It’s easy to think of micro optimisations as not being worth the effort, but I think they are. They may not individually have as much impact as macro optimisations, but it’s easier to get them adopted on a larger scale. Getting 50% of the population to practice one or two good micro optimisations might be more achievable than getting 10% to practice a macro optimisation, and have just as much good effect.  The really serious problem with micro-optimisations is that they’re actually quite hard to get right, and – worse – few people seem to realise it. Gut instincts and common knowledge can often lead you astray. In fact, if you didn’t bat an eyelid at my coffee cup example earlier, you’ve already fallen victim to this.

Most people seem to take it for granted that taking a re-usable ceramic or metal cup to the coffee shop is better for the environment than using a plastic or styrofoam disposable cup each time, but as this article discusses, the disposables end up being better. Reusable mugs require a lot more energy to manufacture than a simple injection molded disposable, and having to clean them many times during their life uses up a whole lot of water (which takes energy to heat) and potentially nasty dishwashing chemicals.

Most people seem to take it for granted that using re-usable cloth nappies are better for the environment than nasty disposable plastic nappies, but as this article discusses, this issue is tremendously controversial. Whether or not the extra energy and resources required to produce a never ending supply of disposable plastic nappies is cancelled out by the extra energy and resources required to constantly clean cloth nappies depends on the details of washing machine efficiency, which changes all the time. Which option is best for you depends largely on how modern your washing machine is.

Most people seem to take it for granted that riding a motorbike or scooter is better for the environment than driving a car, but as this article discusses, they’re actually probably worse. While motorbikes are much smaller and lighter than cars, and hence burn a lot less petrol to cover the same distance, their lower cost and looser regulations mean that what fuel they do burn they burn much less cleanly, making them worse over all.

The point, in case you haven’t got it yet, is that micro optimisation is hard. Doing what feels obviously right, or doing what everyone else is dong, often ends up doing more harm than good. A rational environmentalist
can’t afford to be lazy when making micro-optimisations. Every decision needs to be researched.

The core of the matter seems to be that to get things right, one has to consider very many aspects of a micro optimisation choice. It’s insufficient to reason “this approach produces less landfill, so it’s better” or “this approach uses less petrol/electricity/water, so it’s better”. Very often you can only improve one aspect by making another one worse, so you need to consider them all. Also, you can’t just think in the now, you need to take a “cradle to grave” view of all your options. There are two major problems with this.

First of all, the average person has neither the understanding nor the access to data to accurately assess these various impacts. To do the job properly you have to know all sorts of details about every step in a lot of processes – how things are manufactured, how they are transported, how they are disposed of, how power is generated, water is collected, etc. Even the average holder of a Bachelor of Engineering isn’t going to know all the stuff needed to do these calculations off the top of their head.

Second of all, even if the average person could easily figure out exactly how much landfill, how much CO2, how much plastic, how much water, etc. a particular choice required, there is still the problem of how to prioritise these things. To phrase this in terms of mathematical optimization (which is the correct way to think about it), we have the problem of not actually knowing exactly what our objective function is. Is it worth consuming x extra cubic metres of landfill per year in order to produce y less tons of CO2? Using z extra kilograms of plastic to save w megalitres of water? Which of these concerns is more important? It’s unlikely that these questions even have clear, static, objective answers. It depends on where you live, it depends on possibly unreliable estimates of how much of various resources exist it nature, and it depends on which of a number of possible environmental catastrophes you think is worse.

Is environmental optimisation a matter of “go hard or go home”? Perhaps not entirely, but it seems likely to me that until some people somewhere make a tremendous effort to actually do all the terribly complicated mathematics required to clear all these issues up for common choices, people who want to be rational environmentalists should probably refrain from getting too caught up in micro optimisations that shift the balance amongst things they consume, and focus instead on just consuming less. Instead of agonising over whether to drive a petrol car or a motorbike, ask yourself if you can get away with just doing less driving. Instead of worrying about what sort of container to buy a drink in when you’re out, try to just buy less drinks when out.

Finally, this entry is perhaps overly pessimistic. Environmental micro-optimisation isn’t all doom and gloom, and there are small things one can do which are obviously helpful. Hanging your laundry out to dry when the
weather permits instead of using an electric drier is obviously a good thing to do. The important take away message is that you need to think about your micro optimisations and whether they actually, clearly help, or whether
they’re just trading one problem for another problem without actually having any clear benefit.

Generating activity stream feeds using foaflib and feedformatter

Yesterday I threw together a really quick and simple little script that combines the ActivityStream utility from foaflib with feedformatter to produce RSS1, RSS2 and Atom feeds containing events from the various webpages and accounts that are listed
in my FOAF profile. These feeds should include all of my posts to this blog, all of my identi.ca notices and all of my delicio.us bookmarks. The functionality is a little like that provided by FriendFeed, except it’s powered by FOAF files, which means that as long as you know where to get someone’s FOAF file you don’t have to manually add all their various presences yourself — especially handy for when a friend creates a new account somewhere.

The code to generate the feeds is delightfully simple. It took about 5 minutes to get something working just fine. This was the first time I’d used either of these libraries in a long time (just because I wrote them doesn’t mean I remember the syntax a few months down the track!), but the examples on the respective project wikis made it really easy to figure out how to do this:


from foaflib.classes.person import Person
from foaflib.utils.activitystream import ActivityStream
from feedformatter import Feed

# Fetch the FOAF profile of the friend to watch
p = Person("http://www.luke.maurits.id.au/foaf.rdf")

# Create an ActivityStream object for that friend
stream = ActivityStream(p)

# Create the feed
feed = Feed()

# Set the feed/channel level properties
feed.feed["title"] = "Luke Maurits' Activity Stream Feed"
feed.feed["link"] = "http://www.luke.maurits.id.au"
feed.feed["author"] = "Luke Maurits"
feed.feed["description"] = "A feed of my activity at various places, built hourly using my FOAF profile."

# Iterate over the latest things they've done:
for event in stream.get_latest_events():
        # Create an item
        item = {}
        item["title"] = event.type + ": " + event.detail
        item["link"] = event.link
        item["pubDate"] = event.timestamp
        item["guid"] = event.link
        feed.items.append(item)

# Save the feed to a file in various formats
feed.format_rss1_file("/output/path/goes/here/activity_rss1.xml")
feed.format_rss2_file("/output/path/goes/here/activity_rss2.xml")
feed.format_atom_file("/output/path/goes/here/activity_atom.xml")

Just 21 lines of code, excluding comments and blanks! I’m quite pleased with how parsimonious both libraries have turned out to be in a model usage case. I run the above as an hourly cron job and that’s all there is to it.

Currently the ActivityStream object in foaflib only supports pulling in events from: Twitter feeds, Identi.ca feeds, Delicio.us bookmarks and blog entries if (i) the blog is identified using foaf:weblog and (ii) has an RSS or Atom feed specified in a <link rel="alternate" type="application/rss+xml/> tag in the page pointed to. But this is only because those are the kinds of events that I wanted support for at the time.  Any social service that either provides data in RSS or Atom form, or which has a Python API for extracting data, is a perfectly viable candidate for inclusion in future releases. Photo sharing sites like Flickr would be an obvious addition.

It’s very easy to see how projects like these could easily replace Facebook, MySpace, etc., making the social web a much more decentralised and accessible place, where nobody has to have an account at any special place to see what other people are doing. A lot would need to happen to make this model user-friendly enough for the average Facebook user, and I’m not even going to pretend I have what it takes to do work in that direction, but with some luck somebody will, one day.

The biggest missing piece of the puzzle in my little application here is any kind of privacy/security. Places like Facebook let you designate people as friends (and divide your friends into groups), and set who can see what aspects of your activity based on those credentials. Some equivalent process to this would absolutely have to be present in any FOAF-based Facebook-killer.   Fortunately, the absolutely ingenious idea that is FOAF+SSL makes this possible. There’s no support for it in foaflib yet, which is the library’s greatest current shortcoming. Hopefully I’ll have the time to implement it one day. Of course, getting that technology made easier enough for the average netizen is an even bigger hurdle again. This combined with the fact that the average person probably doesn’t understand the appeal of decentralisation makes me wonder if we ever will see a trend in the direction of this sort of thing.

Here’s hoping.