Posts tagged ‘feedformatter’

Generating activity stream feeds using foaflib and feedformatter

Yesterday I threw together a really quick and simple little script that combines the ActivityStream utility from foaflib with feedformatter to produce RSS1, RSS2 and Atom feeds containing events from the various webpages and accounts that are listed
in my FOAF profile. These feeds should include all of my posts to this blog, all of my identi.ca notices and all of my delicio.us bookmarks. The functionality is a little like that provided by FriendFeed, except it’s powered by FOAF files, which means that as long as you know where to get someone’s FOAF file you don’t have to manually add all their various presences yourself — especially handy for when a friend creates a new account somewhere.

The code to generate the feeds is delightfully simple. It took about 5 minutes to get something working just fine. This was the first time I’d used either of these libraries in a long time (just because I wrote them doesn’t mean I remember the syntax a few months down the track!), but the examples on the respective project wikis made it really easy to figure out how to do this:


from foaflib.classes.person import Person
from foaflib.utils.activitystream import ActivityStream
from feedformatter import Feed

# Fetch the FOAF profile of the friend to watch
p = Person("http://www.luke.maurits.id.au/foaf.rdf")

# Create an ActivityStream object for that friend
stream = ActivityStream(p)

# Create the feed
feed = Feed()

# Set the feed/channel level properties
feed.feed["title"] = "Luke Maurits' Activity Stream Feed"
feed.feed["link"] = "http://www.luke.maurits.id.au"
feed.feed["author"] = "Luke Maurits"
feed.feed["description"] = "A feed of my activity at various places, built hourly using my FOAF profile."

# Iterate over the latest things they've done:
for event in stream.get_latest_events():
        # Create an item
        item = {}
        item["title"] = event.type + ": " + event.detail
        item["link"] = event.link
        item["pubDate"] = event.timestamp
        item["guid"] = event.link
        feed.items.append(item)

# Save the feed to a file in various formats
feed.format_rss1_file("/output/path/goes/here/activity_rss1.xml")
feed.format_rss2_file("/output/path/goes/here/activity_rss2.xml")
feed.format_atom_file("/output/path/goes/here/activity_atom.xml")

Just 21 lines of code, excluding comments and blanks! I’m quite pleased with how parsimonious both libraries have turned out to be in a model usage case. I run the above as an hourly cron job and that’s all there is to it.

Currently the ActivityStream object in foaflib only supports pulling in events from: Twitter feeds, Identi.ca feeds, Delicio.us bookmarks and blog entries if (i) the blog is identified using foaf:weblog and (ii) has an RSS or Atom feed specified in a <link rel="alternate" type="application/rss+xml/> tag in the page pointed to. But this is only because those are the kinds of events that I wanted support for at the time.  Any social service that either provides data in RSS or Atom form, or which has a Python API for extracting data, is a perfectly viable candidate for inclusion in future releases. Photo sharing sites like Flickr would be an obvious addition.

It’s very easy to see how projects like these could easily replace Facebook, MySpace, etc., making the social web a much more decentralised and accessible place, where nobody has to have an account at any special place to see what other people are doing. A lot would need to happen to make this model user-friendly enough for the average Facebook user, and I’m not even going to pretend I have what it takes to do work in that direction, but with some luck somebody will, one day.

The biggest missing piece of the puzzle in my little application here is any kind of privacy/security. Places like Facebook let you designate people as friends (and divide your friends into groups), and set who can see what aspects of your activity based on those credentials. Some equivalent process to this would absolutely have to be present in any FOAF-based Facebook-killer.   Fortunately, the absolutely ingenious idea that is FOAF+SSL makes this possible. There’s no support for it in foaflib yet, which is the library’s greatest current shortcoming. Hopefully I’ll have the time to implement it one day. Of course, getting that technology made easier enough for the average netizen is an even bigger hurdle again. This combined with the fact that the average person probably doesn’t understand the appeal of decentralisation makes me wonder if we ever will see a trend in the direction of this sort of thing.

Here’s hoping.

Another New Feedformatter

Well, true to my word, Feedformatter 0.3 is out tonight. I think I will make this the last of the “Release early, release often” rush releases. There is really very little sense to it. That said, I am enjoying this project and am pleased with the direction it is heading. All of the releases so far have been kind of ugly because they’ve been one-day improvements upon the previous version. Because the original was a quick-and-dirty solution that I didn’t so much design as just beat around until it worked, none of the subequent versions have looked much better. I think I’ll leave 0.4 until this weekend sometime and make sure it is a substantial improvement. I know my way around the problem space much better by now and should be able to produce something that is half-way decent. Please look forward to it (as they say in Japan)!

In unrelated news, I have been reading the docs for CherryPy these past few days and have been thinking of giving it a shot with my new lighttpd setup. I have an idea for a first project (that leverages some of my existing free software) that I’ll write about when it looks closer to actually happening.

Lighttpd and new feedformatter

Last night I replaced the Apache 1.3.x webserver which had been hosting this site with lighttpd (pronounced “lighty”), a very small, light and fast webserver which emphasises the use of FastCGI to overcome the limitations of traditional CGI, instead of embedding language interpreters in the server. This is a view point that I approve of, for reasons of security and freedom of choice in server/language pairings. I’ve actually tried switching to lighty before, but ended up not because I couldn’t get PHP working with FastCGI (a requirement for my TombSaver page). It turns out if I’d read the MESSAGE that pkgsrc shows after you install php I almost certainly would have, but oh well. It’s done now and I’m happy with the change.

I’ve also released a new version of feedformatter – already! I am taking the “release early, release often” idea to quite an extreme with this latest project (normally I wouldn’t release anything in the state that feedparser 0.1 and 0.2 have been). Realistically, this is no big problem – in all probability nobody has even used 0.1 yet anyway. The new version includes “pretty printing” of feeds (with newlines and indentation), a first stab at some compatibility with the Universal Feed Parser, better feed validation (though there is still a long way to go on this front) and slightly tidier code.

Yes, there probably will be a 0.3 release in the next day or two.

Now with feeds!

Travels

I’m back from honeymoon! Some of you may have noticed the nifty new travel maps that are up on my homepage. I expect these will change fairly slowly over time, due to the costs of international travel. I’ll try to get a working photo gallery of honeymoon shots up soon.

Web framework progress

Some of you may have noticed that there are now links from my homepage to valid RSS 1.0, RSS 2.0 and Atom 1.0 feeds for articles published on this site. These are generated by a Python module I wrote specifically for the task, which I have released on my software page as the Universal Feed Formatter (in reference to the well known, used and loved Universal Feed Parser). I was actually surprised I had to write my own module to achieve this. There is a lot of Python code for parsing various feed formats on the internet, but surprisingly few for producing the feeds themselves. I certainly couldn’t find anything on the net that could take a single dictionary structure and produce files in various formats like feedformatter can. Hopefully someone else can take advantage of this convenience.

feedformatter is now integrated with the simple web framework that I mentioned in my last entry. You’ll also notice that I have a working (though imperfect) sitemap up as well, again generated by the framework. With these things done, I think I’ve now accomplished all of my original goals for this project. The code is by no means clean or reliable, so I won’t be releasing it at the moment, but it works and can be progressively polished over time. I will probably do this before I begin work on implementing some sort of commenting system for my articles.

I have been thinking, vaguely, about extending the framework to include blogging, and replacing pyblosxom with it. My reason for this is not really a direct disatisfaction with pyblosxom. It’s the fact that a lot of the plugins that people write with pyblosxom do not work well (or at all!) with pyblosxom’s static rendering mode (which is the only mode I will use because I refuse to dynamically render static content each time it is viewed). This deficiency is the reason that there is no pagination on this blog (yet). Eventually this will become a problem, at which point I’ll either need to hack someone else’s pyblosxom plugin or switch to a new blogging platform – that new platform may as well be an extension of my own framework, because that will mean one less set of templates I need to maintain to match the rest of the my site.

Web log analysis

Several months ago I installed the /www/webalizer package from pgksrc on my web server – it’s a web log analyser that I run from cron every hour. It compiles basic statistics on hits to my website (most popular pages, most popular entry and exit pages, viewer country statistics based on GeoIP, etc.) and then produces HTML reports. I kept a half-hearted eye on these statistics for the first few days, but then mostly forgot about them. I revisited my stats pages earlier this week, and was pleased to see how much traffic I was apparently getting.

Intrigued, I decided to step my analyses up a bit by configuring my web server to log user agents and referring URLs in addition to the basic information already logged. Now able to see user agents, it’s become clear that most of the traffic I thought I was getting was not actually from people but rather search engine crawlers. Oops. I’ve changed my webalizer settings now to ignore these hits, but it will be a while before I can collect meaningful statistics on the genuine human traffic.

The most interesting things the log analysis reveals at this point are

  1. My NetBSD survival guide is the most popular page on the site. In fact, with some googling I was even able to discover that the URL for that page was given out in an OpenBSD IRC channel earlier this year! The survival guide was actually in fairly poor shape all this time, so I’ve put some effort into expanding and polishing it lately, given the important role it seems to play for my site. I still have a bit more to write, though, so watch that page over the next week or two for some activitiy.
  2. More than one person has wound up at this blog page searching for information on Itojun’s cause of death toward the end of last year. I did a lot of searching trying to find this out myself, and have come to the conclusion that there is not currently, and probably is not likely to ever be, a definite answer to this on the web. The only real leads I’ve found so far are a claim on the OpenBSD news site undeadly.org that it was a car accident and a claim on Slashdot that it was suicide – neither of these are substantiated by any kind of hard evidence. It seems clear that Itojun’s family and close friends wish the cause the remain private, and I think the best thing would be for his well wishers to respect that.