Archive for September 2009

Chrooting Python in OpenBSD 4.5

Just a quick technical entry today, on the off chance that this helps somebody, somewhere.

For reason’s I’ll go into another time, I’ve recently been trying to chroot a Python installation into the directory /srv/www on an OpenBSD 4.5 machine. I took the standard approach to this, which is to examine the output of ldd /usr/local/bin/python2.5:

/usr/local/bin/python2.5:
Start End Type Open Ref GrpRef Name
1c000000 3c004000 exe 1 0 0 /usr/local/bin/python2.5
05432000 2547c000 rlib 0 1 0 /usr/local/lib/libpython2.5.so.1.0
05dc3000 25dc7000 rlib 0 1 0 /usr/lib/libutil.so.11.0
0df1e000 2df44000 rlib 0 1 0 /usr/lib/libstdc++.so.47.0
0d543000 2d54d000 rlib 0 1 0 /usr/lib/libm.so.5.0
02a7c000 22a85000 rlib 0 1 0 /usr/lib/libpthread.so.11.1
06a56000 26a8f000 rlib 0 1 0 /usr/lib/libc.so.50.1
08db8000 08db8000 rtld 0 1 0 /usr/libexec/ld.so

and copy each of these files over to the corresponding location in the chroot directory (i.e. stuff from /usr/lib/ goes in /srv/www/usr/lib). But after doing this, testing the setup with chroot /srv/www python2.5 yielded the error: /usr/bin/local/python2.5: can't load library 'libpython2.5.so.1.0'.

I went near mad trying to figure out why this wasn’t working. The libpython file was clearly in the correct location. A PHP installation that I had chrooted into the very same place using the very same procedure worked flawlessly. There seemed to be no scope for rational explanation of why this wasn’t working.

In the end, I managed to solve this problem by copying the file /var/run/ld.so.hints into the corresponding location in the chroot directory. Everything worked perfectly after that. I don’t profess to have any idea why you need to do this (I discovered that it worked after much trial and error, based on random permutations of stuff found on the web), but you do. Hopefully this post saves some other poor hacker from wasting a few frustrated hours.

I note that the ld.so.hints file exists in the same location on recent NetBSD releases, so I assume this advice ports over to NetBSD as well. However, there is no such file in my install of Arch Linux, so things are probably different in Linux land. Linux distros do seem to have a file /etc/ld.so.conf which serves a similar purposes – it was actually reading about this that gave me the idea to look for something similar in my situation.

Video games as a special case in copyright ethics

A brief thought on the inapplicability of a common argument against file sharing to the case of old video games:

The most common argument against file sharing is that artists lose money when people download rather than purchase music, films, etc. Now, for a variety of reasons this is actually not a very compelling argument to anybody who really thinks about it. It’s full of holes. Chances are very good that piracy actually makes artists more money – but that’s not the subject of this post. For the purposes of this post, all that really matters is that this bad argument at least has some sort of superficial, surface plausibility. It’s not completely outlandish. When it comes to most music and films – and certainly to anything remotely popular – the option exists of going into a physical store and buying a physical plastic disc for money. Through the long chain of middlemen, at least some of that money makes its way into the pocket of the people who worked hard to produce that piece of work. When you don’t buy the work this way, but download it from a file sharing network, then this chain of events doesn’t happen, and so in some loose sense, the artists “lose” money. You can see why people might be sympathetic to this point of view the first time they think about it.

Even if this argument was absolutely bulletproof, it’s interesting to note that it absolutely does not apply for video games beyond a certain age, because the video game market works very differently to the music and film market, in that
products age extremely rapidly. If I felt like buying some music or films which were produced in the 70s or 80s, I could easily do that. There’s still a big market for those products, and so people keep producing them to keep up
with demand, and they get released on new forms of media when they come out. However, if I wanted to play a video game produced even in the 90s, I’m probably out of luck. It’s just not possible to buy a game that old off the
shelf. No store on Earth stocks them.

I’d really like to play Final Fantasy VII some day, since it is widely regarded as a classic of its genre, and not having played it makes me feel like I’m missing out on an important part of being a geek, kind of like a fantasy fiction fan who’s never read Lord of the Rings. I have two options for acquiring FFVII. I could buy a second hand copy from someone, which is the perfectly legitimate, law-abiding citizen option, or I could download it from a file sharing network, which would be illegal, and make me a “pirate”.

The thing is, there’s no solid ethical argument for why anyone should buy a second hand copy. If you do so, none of the money you spend (and this might be a lot of money – because the game is so famous and people are reluctant to part with it, copies can sometimes go for over $100 on eBay, more than the cost of a brand new modern game) goes to the people who helped to make that game, to compensate them for their time and creativity. Every last cent of it goes to the person you’re buying it from, to compensate them for giving up a physical possession. By downloading the game illegally, no money changes hands at all, but then nobody is giving up any physical possessions either. There’s
absolutely no difference between the two options from the point of view of the artists. They certainly don’t lose money when you download it. There’s simply no option today by which you can acquire the game and compensate the creators. The nearest you could come is to pirate the game and then send a cheque for a tiny amount to every writer, programmer, animator, composer, etc. who worked on the game, which is obviously not feasible for a bunch of reasons. Unless you’re a devoted collector who actually wants the physical product to put on a shelf, the most sensible thing to do is to just download it and not feel one iota of guilt.

This is a good example of where copyright law does far less to protect artists than it does to hurt consumers, by making the difficulty of legally acquiring a game directly proportional to how popular it is. Imagine how much less popular the Beatles would be amongst young people today if the only legal way to get one of their albums was to buy it second hand for $100!

Python vs C for performance

Via Reddit today I came across a fairly decent article on Python optimization tips and issues, which comes across fairly heavily in favour of the idea that by being careful and knowing what you’re doing, you can typically make Python implementations of
numerical algorithms fast enough for practical purposes, saving you the massive headaches associated with development in C.

This is a fairly relevant topic for me. A huge part of my PhD research is writing and playing with computational models of language acquisition. These models usually use Bayesian inference as a model of an “ideal learner” (doing this is fairly trendy in modern computational cognitive science, and not without good reason). When it comes to doing numerical Bayesian inference, a class of techniques known as Markov Chain Monte Carlo set the standard, and MCMC computations are the bread and butter of my programming work. Without going into too much detail, MCMC computations are highly iterative – you basically just do the same few steps over and over as fast as you can until your program converges on your answer.

When I first started writing MCMC models for my research, I did them in Python. I knew that my C skills weren’t great, that I’d be able to program a lot faster in Python, and that there were enough resources on the net for high performance Python computing that I’d be able to get my programs fast enough.  But after investing a lot of time trying to get my first Python MCMC program running fast – using things like numpy’s arrays and psyco – I still wasn’t happy with how slow things were going. I knuckled down and rewrote my model in C, and it absolutely blew the pants off my Python version. Of course I expected it to be faster, but it was much faster than I ever expected it to be. From then on in, I’ve written all of my MCMC stuff in C.  Since then I’ve become aware of PyMC, a Python library geared specifically toward MCMC. So obviously someone out there is able to make Python work for MCMC at a decent speed. This makes me wonder if maybe I really was doing something extremely wrong the whole time I was using Python for numerical work. Maybe it’s time I really dedicate some time to learning how to make Python faster for this kind of thing, to save myself time and effort in the future.

Even if it turns out that suffering through C implementations for the past 18 months has been something of a waste of time, to be honest I wouldn’t really regret it. Relying on C for my research has taught me more about it than I’d ever have learned otherwise. I am now comfortable with using malloc, calloc and free, I’ve done some multithreading work with the pthread API and I’ve learned how to use gdb and gprof.  I feel like I actually deserve to be able to say that I “know C”, as opposed to before when I knew the syntax but couldn’t really work effectively in C. Of course, I am still a long way from being any sort of guru, but I feel like I’ve taken important steps. I’ll be much more confident the next time I have to read and understand some C code.  A lot of people probably feel that in this day and age of Python and Ruby and Lua (and in less “webby” parts of the world, Java and C#) that competence in C is an obsolete skill. I think there’s a degree of truth in that, and certainly C shouldn’t be used for new projects without a compelling argument in favour of it (and a simple “it’s faster”, is not a compelling argument!). But at the same time I think that mastery of C is still an important rite of passage for serious programmers, and it feels good to have taken a number of extra steps in that direction.

But if I can work in Python from now on, I’m not going to miss the segaults.

On the difficulty of environmental micro optimisation

When it comes to things that a person can do to help reduce the negative environmental impact of their life, I think one of possible several sensible distinctions to draw is one between what I call “environmental macro optimisations” and “environmental micro optimisations”.

Macro optimisations are big, expensive, obvious changes that have an incontrovertibly positive impact on the environment. Powering your house using solar panels on the roof, installing rainwater tanks and driving an electric car are all good examples of macro optimisations. Micro optimisations are much smaller, cheaper, simpler changes that don’t have quite so much of an effect: taking a re-usable mug with you to the coffee shop instead of using a plastic disposable, for example.

It’s easy to think of micro optimisations as not being worth the effort, but I think they are. They may not individually have as much impact as macro optimisations, but it’s easier to get them adopted on a larger scale. Getting 50% of the population to practice one or two good micro optimisations might be more achievable than getting 10% to practice a macro optimisation, and have just as much good effect.  The really serious problem with micro-optimisations is that they’re actually quite hard to get right, and – worse – few people seem to realise it. Gut instincts and common knowledge can often lead you astray. In fact, if you didn’t bat an eyelid at my coffee cup example earlier, you’ve already fallen victim to this.

Most people seem to take it for granted that taking a re-usable ceramic or metal cup to the coffee shop is better for the environment than using a plastic or styrofoam disposable cup each time, but as this article discusses, the disposables end up being better. Reusable mugs require a lot more energy to manufacture than a simple injection molded disposable, and having to clean them many times during their life uses up a whole lot of water (which takes energy to heat) and potentially nasty dishwashing chemicals.

Most people seem to take it for granted that using re-usable cloth nappies are better for the environment than nasty disposable plastic nappies, but as this article discusses, this issue is tremendously controversial. Whether or not the extra energy and resources required to produce a never ending supply of disposable plastic nappies is cancelled out by the extra energy and resources required to constantly clean cloth nappies depends on the details of washing machine efficiency, which changes all the time. Which option is best for you depends largely on how modern your washing machine is.

Most people seem to take it for granted that riding a motorbike or scooter is better for the environment than driving a car, but as this article discusses, they’re actually probably worse. While motorbikes are much smaller and lighter than cars, and hence burn a lot less petrol to cover the same distance, their lower cost and looser regulations mean that what fuel they do burn they burn much less cleanly, making them worse over all.

The point, in case you haven’t got it yet, is that micro optimisation is hard. Doing what feels obviously right, or doing what everyone else is dong, often ends up doing more harm than good. A rational environmentalist
can’t afford to be lazy when making micro-optimisations. Every decision needs to be researched.

The core of the matter seems to be that to get things right, one has to consider very many aspects of a micro optimisation choice. It’s insufficient to reason “this approach produces less landfill, so it’s better” or “this approach uses less petrol/electricity/water, so it’s better”. Very often you can only improve one aspect by making another one worse, so you need to consider them all. Also, you can’t just think in the now, you need to take a “cradle to grave” view of all your options. There are two major problems with this.

First of all, the average person has neither the understanding nor the access to data to accurately assess these various impacts. To do the job properly you have to know all sorts of details about every step in a lot of processes – how things are manufactured, how they are transported, how they are disposed of, how power is generated, water is collected, etc. Even the average holder of a Bachelor of Engineering isn’t going to know all the stuff needed to do these calculations off the top of their head.

Second of all, even if the average person could easily figure out exactly how much landfill, how much CO2, how much plastic, how much water, etc. a particular choice required, there is still the problem of how to prioritise these things. To phrase this in terms of mathematical optimization (which is the correct way to think about it), we have the problem of not actually knowing exactly what our objective function is. Is it worth consuming x extra cubic metres of landfill per year in order to produce y less tons of CO2? Using z extra kilograms of plastic to save w megalitres of water? Which of these concerns is more important? It’s unlikely that these questions even have clear, static, objective answers. It depends on where you live, it depends on possibly unreliable estimates of how much of various resources exist it nature, and it depends on which of a number of possible environmental catastrophes you think is worse.

Is environmental optimisation a matter of “go hard or go home”? Perhaps not entirely, but it seems likely to me that until some people somewhere make a tremendous effort to actually do all the terribly complicated mathematics required to clear all these issues up for common choices, people who want to be rational environmentalists should probably refrain from getting too caught up in micro optimisations that shift the balance amongst things they consume, and focus instead on just consuming less. Instead of agonising over whether to drive a petrol car or a motorbike, ask yourself if you can get away with just doing less driving. Instead of worrying about what sort of container to buy a drink in when you’re out, try to just buy less drinks when out.

Finally, this entry is perhaps overly pessimistic. Environmental micro-optimisation isn’t all doom and gloom, and there are small things one can do which are obviously helpful. Hanging your laundry out to dry when the
weather permits instead of using an electric drier is obviously a good thing to do. The important take away message is that you need to think about your micro optimisations and whether they actually, clearly help, or whether
they’re just trading one problem for another problem without actually having any clear benefit.

Generating activity stream feeds using foaflib and feedformatter

Yesterday I threw together a really quick and simple little script that combines the ActivityStream utility from foaflib with feedformatter to produce RSS1, RSS2 and Atom feeds containing events from the various webpages and accounts that are listed
in my FOAF profile. These feeds should include all of my posts to this blog, all of my identi.ca notices and all of my delicio.us bookmarks. The functionality is a little like that provided by FriendFeed, except it’s powered by FOAF files, which means that as long as you know where to get someone’s FOAF file you don’t have to manually add all their various presences yourself — especially handy for when a friend creates a new account somewhere.

The code to generate the feeds is delightfully simple. It took about 5 minutes to get something working just fine. This was the first time I’d used either of these libraries in a long time (just because I wrote them doesn’t mean I remember the syntax a few months down the track!), but the examples on the respective project wikis made it really easy to figure out how to do this:


from foaflib.classes.person import Person
from foaflib.utils.activitystream import ActivityStream
from feedformatter import Feed

# Fetch the FOAF profile of the friend to watch
p = Person("http://www.luke.maurits.id.au/foaf.rdf")

# Create an ActivityStream object for that friend
stream = ActivityStream(p)

# Create the feed
feed = Feed()

# Set the feed/channel level properties
feed.feed["title"] = "Luke Maurits' Activity Stream Feed"
feed.feed["link"] = "http://www.luke.maurits.id.au"
feed.feed["author"] = "Luke Maurits"
feed.feed["description"] = "A feed of my activity at various places, built hourly using my FOAF profile."

# Iterate over the latest things they've done:
for event in stream.get_latest_events():
        # Create an item
        item = {}
        item["title"] = event.type + ": " + event.detail
        item["link"] = event.link
        item["pubDate"] = event.timestamp
        item["guid"] = event.link
        feed.items.append(item)

# Save the feed to a file in various formats
feed.format_rss1_file("/output/path/goes/here/activity_rss1.xml")
feed.format_rss2_file("/output/path/goes/here/activity_rss2.xml")
feed.format_atom_file("/output/path/goes/here/activity_atom.xml")

Just 21 lines of code, excluding comments and blanks! I’m quite pleased with how parsimonious both libraries have turned out to be in a model usage case. I run the above as an hourly cron job and that’s all there is to it.

Currently the ActivityStream object in foaflib only supports pulling in events from: Twitter feeds, Identi.ca feeds, Delicio.us bookmarks and blog entries if (i) the blog is identified using foaf:weblog and (ii) has an RSS or Atom feed specified in a <link rel="alternate" type="application/rss+xml/> tag in the page pointed to. But this is only because those are the kinds of events that I wanted support for at the time.  Any social service that either provides data in RSS or Atom form, or which has a Python API for extracting data, is a perfectly viable candidate for inclusion in future releases. Photo sharing sites like Flickr would be an obvious addition.

It’s very easy to see how projects like these could easily replace Facebook, MySpace, etc., making the social web a much more decentralised and accessible place, where nobody has to have an account at any special place to see what other people are doing. A lot would need to happen to make this model user-friendly enough for the average Facebook user, and I’m not even going to pretend I have what it takes to do work in that direction, but with some luck somebody will, one day.

The biggest missing piece of the puzzle in my little application here is any kind of privacy/security. Places like Facebook let you designate people as friends (and divide your friends into groups), and set who can see what aspects of your activity based on those credentials. Some equivalent process to this would absolutely have to be present in any FOAF-based Facebook-killer.   Fortunately, the absolutely ingenious idea that is FOAF+SSL makes this possible. There’s no support for it in foaflib yet, which is the library’s greatest current shortcoming. Hopefully I’ll have the time to implement it one day. Of course, getting that technology made easier enough for the average netizen is an even bigger hurdle again. This combined with the fact that the average person probably doesn’t understand the appeal of decentralisation makes me wonder if we ever will see a trend in the direction of this sort of thing.

Here’s hoping.