Posts tagged ‘software’

Generating activity stream feeds using foaflib and feedformatter

Yesterday I threw together a really quick and simple little script that combines the ActivityStream utility from foaflib with feedformatter to produce RSS1, RSS2 and Atom feeds containing events from the various webpages and accounts that are listed
in my FOAF profile. These feeds should include all of my posts to this blog, all of my identi.ca notices and all of my delicio.us bookmarks. The functionality is a little like that provided by FriendFeed, except it’s powered by FOAF files, which means that as long as you know where to get someone’s FOAF file you don’t have to manually add all their various presences yourself — especially handy for when a friend creates a new account somewhere.

The code to generate the feeds is delightfully simple. It took about 5 minutes to get something working just fine. This was the first time I’d used either of these libraries in a long time (just because I wrote them doesn’t mean I remember the syntax a few months down the track!), but the examples on the respective project wikis made it really easy to figure out how to do this:


from foaflib.classes.person import Person
from foaflib.utils.activitystream import ActivityStream
from feedformatter import Feed

# Fetch the FOAF profile of the friend to watch
p = Person("http://www.luke.maurits.id.au/foaf.rdf")

# Create an ActivityStream object for that friend
stream = ActivityStream(p)

# Create the feed
feed = Feed()

# Set the feed/channel level properties
feed.feed["title"] = "Luke Maurits' Activity Stream Feed"
feed.feed["link"] = "http://www.luke.maurits.id.au"
feed.feed["author"] = "Luke Maurits"
feed.feed["description"] = "A feed of my activity at various places, built hourly using my FOAF profile."

# Iterate over the latest things they've done:
for event in stream.get_latest_events():
        # Create an item
        item = {}
        item["title"] = event.type + ": " + event.detail
        item["link"] = event.link
        item["pubDate"] = event.timestamp
        item["guid"] = event.link
        feed.items.append(item)

# Save the feed to a file in various formats
feed.format_rss1_file("/output/path/goes/here/activity_rss1.xml")
feed.format_rss2_file("/output/path/goes/here/activity_rss2.xml")
feed.format_atom_file("/output/path/goes/here/activity_atom.xml")

Just 21 lines of code, excluding comments and blanks! I’m quite pleased with how parsimonious both libraries have turned out to be in a model usage case. I run the above as an hourly cron job and that’s all there is to it.

Currently the ActivityStream object in foaflib only supports pulling in events from: Twitter feeds, Identi.ca feeds, Delicio.us bookmarks and blog entries if (i) the blog is identified using foaf:weblog and (ii) has an RSS or Atom feed specified in a <link rel="alternate" type="application/rss+xml/> tag in the page pointed to. But this is only because those are the kinds of events that I wanted support for at the time.  Any social service that either provides data in RSS or Atom form, or which has a Python API for extracting data, is a perfectly viable candidate for inclusion in future releases. Photo sharing sites like Flickr would be an obvious addition.

It’s very easy to see how projects like these could easily replace Facebook, MySpace, etc., making the social web a much more decentralised and accessible place, where nobody has to have an account at any special place to see what other people are doing. A lot would need to happen to make this model user-friendly enough for the average Facebook user, and I’m not even going to pretend I have what it takes to do work in that direction, but with some luck somebody will, one day.

The biggest missing piece of the puzzle in my little application here is any kind of privacy/security. Places like Facebook let you designate people as friends (and divide your friends into groups), and set who can see what aspects of your activity based on those credentials. Some equivalent process to this would absolutely have to be present in any FOAF-based Facebook-killer.   Fortunately, the absolutely ingenious idea that is FOAF+SSL makes this possible. There’s no support for it in foaflib yet, which is the library’s greatest current shortcoming. Hopefully I’ll have the time to implement it one day. Of course, getting that technology made easier enough for the average netizen is an even bigger hurdle again. This combined with the fact that the average person probably doesn’t understand the appeal of decentralisation makes me wonder if we ever will see a trend in the direction of this sort of thing.

Here’s hoping.

The unexpected success of PrettyTable

I’ve been a bit lazy lately with regards to keeping this blog up to date on matters relating to my free software.

PrettyTable, which I released back in February (as blogged here) has been an unexpected smash hit. Not so long after I released it, I thought that since, unlike a lot of my other projects it was of quite general interest and also fairly complete / robust, I’d try something different and put it up on PyPI, the official Python package index and the nearest thing Python has to an equivalent of Perl’s venerable CPAN. I figured this would help the project get a little bit of exposure, but I never expected what happened next!

It wasn’t even a week before I got an email from someone letting me know how much he liked the project. A little later he wrote a blog entry on it. Motivated by this I released an updated second version, which quickly received a bug report from someone who had been using 0.1 “often”! This user kindly tested a fix for this bug which became 0.2.1. More and more people started emailing me to report problems or suggest problems and eventually even contributing code! Everything that is supposed to happen in the “magic pixie dust” view of open source software was happening and I was amazed, since nothing else I’ve ever written has received so much as a single “thank you” email. By the time 0.5 was almost readily I was sending regular emails to an impromptu mailing list of 5 or so consistently interested and helpful people announcing changes and asking for feedback. It was starting to become clear that PrettyTable might grow a larger community than I could easily manage by myself with nothing more than a self-hosted webpage and a manually administered mailing list, and on the advice of one of 5 “friends of PrettyTable” I set up a project at Google code. Google code is very similar in spirit to the older and better known Sourceforge project, providing free facilities like file hosting, documentation wikis, mailing lists and version management repositories to free software projects.

The move to Google code, for reasons that still elude me, really thinned out my burgeoning little community. Only two people from the “friends of PrettyTable” mailing list subscribed to any of the new official mailing lists (although other people I’d had no previous contact with have joined them so there’s not been much of a net loss). This was a surprise since things had been going so swimmingly previously. The project is by no means in danger of collapsing – in fact, someone from the Debian project, which distributes one of the oldest and most respected GNU/Linux distributions, recently contacted me to let me know he was submitting PrettyTable to Debian’s famously large package repositories, which I expect will bring in a ton of interest. I just wish I knew what was so off-putting about the move from a visibly amateur project with hand-manged mailing lists to somewhere with a fairly slick web interface for reporting bugs and other communications. Possibly there exists a misconception that using Google code requires a gmail account or something.

At any rate, I’ve really enjoyed PrettyTable’s rapid growth and the project feels like it is moving in a good direction. I plan to release 0.6 fairly soon, which will be a backward-compatibility breaking release in which the basic API is finalised, in a state which I feel is as clean and Pythonic as it can be. There are a few more features that need to be implemented but I imagine it will be a month or two at most before I have something I am happy to call 1.0, which I am cautiously optimistic will become a somewhat well known and regularly used library.

One question I’d like to have answered is how much of PrettyTable’s unprecedented success is due to me putting it on PyPI and how much of it is due to it simply being better or more interesting than other things I’ve written. I put HTTPeek on PyPI shortly after PrettyTable to try to investigate this, and so far I haven’t heard anything about it from anyone. This is probably not enough evidence to make a decision, though, it might be that HTTPeek simply sucks (it turns out a good chunk of its functionality can also be provided by Firefox extensions, so perhaps that’s why it gets no love). I’ll probably start releasing more stuff on PyPI in future to get a better feel for this. I have quite a few projects under way or at least in my head at the moment, including a web browser with a Tk interface (I’ve already written this to a pretty complete degree, but I did it in Python 3, to get a feel for the changes, but probably need to backport it to 2.6 before releasing it because the lack of good libraries like BeautifulSoup and the Python Imaging Library on 3.0 is holding me back too much) and a few tools related to the sadly under-appreciated FOAF project.

PrettyTable 0.1 released

Today I released a simple Python library that I wrote during a 3 hour train ride from Newcastle to Sydney last weekend (more on what I was doing there in a later enry!), which I’ve decided to call PrettyTable. It contains a single class of that name whose job is to make it easy to print nice-looking ASCII tables like this:

+-----------+------+------------+-----------------+
| City name | Area | Population | Annual Rainfall |
+-----------+------+------------+-----------------+
| Adelaide  | 1295 |  1158259   |      600.5      |
| Brisbane  | 5905 |  1857594   |      1146.4     |
| Darwin    | 112  |   120900   |      1714.7     |
| Hobart    | 1357 |   205556   |      619.5      |
| Sydney    | 2058 |  4336374   |      1214.8     |
| Melbourne | 1566 |  3806092   |      646.9      |
| Perth     | 5386 |  1554769   |      869.4      |
+-----------+------+------------+-----------------+

Some of you may recognise the style of table from the PostgreSQL shell psql, which was the inspiration for PrettyTable.

It’s quite a simple little piece of code (you can read about the various options at the page linked to above, and you can even see the Pydoc API – this is actually the first time I’ve used Pydoc on my own software!) but it’s also the kind of thing that I suspect will actually find use in a wide range of future projects, both of my own and hopefully of others.

Cherryblosxom 0.1 released

Back on Thursday I made the first public release of Cherryblosxom, calling it 0.1. It is probably of minimal use to anybody because it comes with no documentation and is missing crucial features (e.g. RSS/Atom feeds), but I think even a mostly-useless 0.1 release can be beneficial for a project in that it maintains a sense of momentum.

As you’ll be able to see if your browser does RSS/Atom auto-discovery, the development version of Cherryblosxom that powers this blog has basic RSS/Atom functionality – provided by a development version of feedformatter. I hope to have both of those development versions released sometime in the next week, but I have substantially less free time for that sort of thing for the next fortnight (and have had for the last week, hence the relative lack of blogging). If I do get the Cherryblosxom version out as 0.2, perhaps I’ll make some minor concessions in the direction of user friendliness. Once the product itself is a little more complete I will disttools to make it easy to install (in the usual way, python setup.py install) and eventually I’ll put it on PyPi.

I’ve not forgotten about my Prime Time series, but when I next have time to put out a “proper” blog entry I hope to talk briefly about the use of Bayesian inference in phylogenetics. This is admittedly not something I know a lot about, but I feel like I know enough to find it justifiably fascinating.

Goodbye PyBlosxom, Hello Cherryblosxom!

As I said I would, I had a poke around the relevant PyBlosxom plugins in an attempt to figure out why the anti-spam question feature was playing havoc. To cut a long story short, I did not succeed. In the process of failing, I was driven near insane by what seemed like inexplicably inconsistent behaviour by Pyblosxom. This might be due to PyBlosxom being a low quality product or it might be due to me not understanding what I was doing.

Regardless of the cause, I decided that a change was due. I wanted to use a blogging platform whose ability to “just work” I could have some degree of trust in. I turned, first, to Wordpress, because, well, everybody else does (mass popularity of software, in my experience, is a terrible metric for software quality, but generally is a good metric for “ease of just getting the damn thing to do what you basically want it to do”). One minute through their “famous five minute install” I discovered that Wordpress only works with MySQL as a backend database. Seriously lame. MySQL is one of the aforementioned cases of software popularity being a terrible metric for software quality. My server uses PostgreSQL and I really didn’t feel like installing a second RDBMS
on an already over-strained machine just for a blog. Heck, I really didn’t want my blog to rely on a database at all.

So I bit the bullet and did something similar to what I threatened to do back when I first set up commenting in PyBlosxom, and wrote my own blogging platform, in Python, using CherryPy to handle the HTTP request routing stuff and Cheetah
to handle the templating. This new platform is entirely filesystem based, like PyBlosxom and Blosxom before it – no databases are necessary. I’m tentatively naming it “CherryBlosxom”, acknowledging the fact that CherryPy does all the
hard stuff and that it follows in the idealogical footsteps of Blosxom and PyBlosxom, being that it tries to be as small and simple as possible.

If you’re reading this entry, it means that I’ve made the switch and this blog is now powered by CherryBlosxom – an extremely alpha version of CherryBlossxom, which has not been thoroughly debugged and is missing some features (like RSS and Atom feeds, which I hope to eventually provide with feedformatter).  It’s entirely possible that features like archive links and tag links and commenting will not work entirely properly just yet. I am hoping to have the most severe bugs taken care of pretty quickly, in which case I’ll call what’s there 0.1 and formally release it. In the mean time, please try to use the blog like you usually would and if anything breaks drop me a line. I’m looking forward to making CherryBlosxom a polished product!

Another New Feedformatter

Well, true to my word, Feedformatter 0.3 is out tonight. I think I will make this the last of the “Release early, release often” rush releases. There is really very little sense to it. That said, I am enjoying this project and am pleased with the direction it is heading. All of the releases so far have been kind of ugly because they’ve been one-day improvements upon the previous version. Because the original was a quick-and-dirty solution that I didn’t so much design as just beat around until it worked, none of the subequent versions have looked much better. I think I’ll leave 0.4 until this weekend sometime and make sure it is a substantial improvement. I know my way around the problem space much better by now and should be able to produce something that is half-way decent. Please look forward to it (as they say in Japan)!

In unrelated news, I have been reading the docs for CherryPy these past few days and have been thinking of giving it a shot with my new lighttpd setup. I have an idea for a first project (that leverages some of my existing free software) that I’ll write about when it looks closer to actually happening.

Lighttpd and new feedformatter

Last night I replaced the Apache 1.3.x webserver which had been hosting this site with lighttpd (pronounced “lighty”), a very small, light and fast webserver which emphasises the use of FastCGI to overcome the limitations of traditional CGI, instead of embedding language interpreters in the server. This is a view point that I approve of, for reasons of security and freedom of choice in server/language pairings. I’ve actually tried switching to lighty before, but ended up not because I couldn’t get PHP working with FastCGI (a requirement for my TombSaver page). It turns out if I’d read the MESSAGE that pkgsrc shows after you install php I almost certainly would have, but oh well. It’s done now and I’m happy with the change.

I’ve also released a new version of feedformatter – already! I am taking the “release early, release often” idea to quite an extreme with this latest project (normally I wouldn’t release anything in the state that feedparser 0.1 and 0.2 have been). Realistically, this is no big problem – in all probability nobody has even used 0.1 yet anyway. The new version includes “pretty printing” of feeds (with newlines and indentation), a first stab at some compatibility with the Universal Feed Parser, better feed validation (though there is still a long way to go on this front) and slightly tidier code.

Yes, there probably will be a 0.3 release in the next day or two.

Now with feeds!

Travels

I’m back from honeymoon! Some of you may have noticed the nifty new travel maps that are up on my homepage. I expect these will change fairly slowly over time, due to the costs of international travel. I’ll try to get a working photo gallery of honeymoon shots up soon.

Web framework progress

Some of you may have noticed that there are now links from my homepage to valid RSS 1.0, RSS 2.0 and Atom 1.0 feeds for articles published on this site. These are generated by a Python module I wrote specifically for the task, which I have released on my software page as the Universal Feed Formatter (in reference to the well known, used and loved Universal Feed Parser). I was actually surprised I had to write my own module to achieve this. There is a lot of Python code for parsing various feed formats on the internet, but surprisingly few for producing the feeds themselves. I certainly couldn’t find anything on the net that could take a single dictionary structure and produce files in various formats like feedformatter can. Hopefully someone else can take advantage of this convenience.

feedformatter is now integrated with the simple web framework that I mentioned in my last entry. You’ll also notice that I have a working (though imperfect) sitemap up as well, again generated by the framework. With these things done, I think I’ve now accomplished all of my original goals for this project. The code is by no means clean or reliable, so I won’t be releasing it at the moment, but it works and can be progressively polished over time. I will probably do this before I begin work on implementing some sort of commenting system for my articles.

I have been thinking, vaguely, about extending the framework to include blogging, and replacing pyblosxom with it. My reason for this is not really a direct disatisfaction with pyblosxom. It’s the fact that a lot of the plugins that people write with pyblosxom do not work well (or at all!) with pyblosxom’s static rendering mode (which is the only mode I will use because I refuse to dynamically render static content each time it is viewed). This deficiency is the reason that there is no pagination on this blog (yet). Eventually this will become a problem, at which point I’ll either need to hack someone else’s pyblosxom plugin or switch to a new blogging platform – that new platform may as well be an extension of my own framework, because that will mean one less set of templates I need to maintain to match the rest of the my site.

Web log analysis

Several months ago I installed the /www/webalizer package from pgksrc on my web server – it’s a web log analyser that I run from cron every hour. It compiles basic statistics on hits to my website (most popular pages, most popular entry and exit pages, viewer country statistics based on GeoIP, etc.) and then produces HTML reports. I kept a half-hearted eye on these statistics for the first few days, but then mostly forgot about them. I revisited my stats pages earlier this week, and was pleased to see how much traffic I was apparently getting.

Intrigued, I decided to step my analyses up a bit by configuring my web server to log user agents and referring URLs in addition to the basic information already logged. Now able to see user agents, it’s become clear that most of the traffic I thought I was getting was not actually from people but rather search engine crawlers. Oops. I’ve changed my webalizer settings now to ignore these hits, but it will be a while before I can collect meaningful statistics on the genuine human traffic.

The most interesting things the log analysis reveals at this point are

  1. My NetBSD survival guide is the most popular page on the site. In fact, with some googling I was even able to discover that the URL for that page was given out in an OpenBSD IRC channel earlier this year! The survival guide was actually in fairly poor shape all this time, so I’ve put some effort into expanding and polishing it lately, given the important role it seems to play for my site. I still have a bit more to write, though, so watch that page over the next week or two for some activitiy.
  2. More than one person has wound up at this blog page searching for information on Itojun’s cause of death toward the end of last year. I did a lot of searching trying to find this out myself, and have come to the conclusion that there is not currently, and probably is not likely to ever be, a definite answer to this on the web. The only real leads I’ve found so far are a claim on the OpenBSD news site undeadly.org that it was a car accident and a claim on Slashdot that it was suicide – neither of these are substantiated by any kind of hard evidence. It seems clear that Itojun’s family and close friends wish the cause the remain private, and I think the best thing would be for his well wishers to respect that.

Two New Software Releases

Just a quick entry to announce two new (well, almost new) software releases.

Firstly, I have finally brought my (X)HTML link testing script TestLinks up to a high enough standard to make it worth releasing. You can read all about it and download it from here. It’s just a simple script designed to be run from cron and let you know if any links on your sites go stale. Simple, but useful. It’s written in Python, of course.

Secondly, I realised that although I wrote the GetAuWeather page and linked to it from my software page way back in July, I never actually put a link to the code in there. This has been fixed. GetAuWeather is a simple Python module for downloading Australian weather observations from the Bureau of Meteorology’s website and parsing it up into sensible Pythonic data structures. At the moment it’s just a function for doing that. One day I will use this to make something useful, probably by writing a daemon to constantly throw new weather data into an SQLite database (taking advantage of the SQLite support new to Python 2.5) and then making a fancy web interface to this database.

HTTPeek released, Tomb Saver live

Today I have released HTTPeek 0.1 on my software page under the usual BSD license. This release is not quite in the “everything works but the code is ugly” state I had hoped the first release would be, as dealing with HTTP/0.9 is still rather unreliable. I was unable to work on HTTPeek for long enough during my recent time without a computer that I cant make fixes very quickly anymore and I was worried that the project would just stall all together if I waited until everything was perfect. Hopefully those few issues will be sorted out soon.

Also, for a little while now (perhaps a fortnight?), the website for Tomb Saver has been live. Tomb Saver is a project run by my fianc