Posts tagged ‘pyblosxom’

Goodbye PyBlosxom, Hello Cherryblosxom!

As I said I would, I had a poke around the relevant PyBlosxom plugins in an attempt to figure out why the anti-spam question feature was playing havoc. To cut a long story short, I did not succeed. In the process of failing, I was driven near insane by what seemed like inexplicably inconsistent behaviour by Pyblosxom. This might be due to PyBlosxom being a low quality product or it might be due to me not understanding what I was doing.

Regardless of the cause, I decided that a change was due. I wanted to use a blogging platform whose ability to “just work” I could have some degree of trust in. I turned, first, to Wordpress, because, well, everybody else does (mass popularity of software, in my experience, is a terrible metric for software quality, but generally is a good metric for “ease of just getting the damn thing to do what you basically want it to do”). One minute through their “famous five minute install” I discovered that Wordpress only works with MySQL as a backend database. Seriously lame. MySQL is one of the aforementioned cases of software popularity being a terrible metric for software quality. My server uses PostgreSQL and I really didn’t feel like installing a second RDBMS
on an already over-strained machine just for a blog. Heck, I really didn’t want my blog to rely on a database at all.

So I bit the bullet and did something similar to what I threatened to do back when I first set up commenting in PyBlosxom, and wrote my own blogging platform, in Python, using CherryPy to handle the HTTP request routing stuff and Cheetah
to handle the templating. This new platform is entirely filesystem based, like PyBlosxom and Blosxom before it – no databases are necessary. I’m tentatively naming it “CherryBlosxom”, acknowledging the fact that CherryPy does all the
hard stuff and that it follows in the idealogical footsteps of Blosxom and PyBlosxom, being that it tries to be as small and simple as possible.

If you’re reading this entry, it means that I’ve made the switch and this blog is now powered by CherryBlosxom – an extremely alpha version of CherryBlossxom, which has not been thoroughly debugged and is missing some features (like RSS and Atom feeds, which I hope to eventually provide with feedformatter).  It’s entirely possible that features like archive links and tag links and commenting will not work entirely properly just yet. I am hoping to have the most severe bugs taken care of pretty quickly, in which case I’ll call what’s there 0.1 and formally release it. In the mean time, please try to use the blog like you usually would and if anything breaks drop me a line. I’m looking forward to making CherryBlosxom a polished product!

Commenting feature broken

I’ve just noticed this morning that the anti-spam questions on this blog’s comment form are playing up, refusing to accept what are in fact correct answers and hence making it effectively impossible to comment on anything. I have no idea what is causing this or how long this has been the situation – I’m working on fixing it as quickly as I can spare time for it at the moment, hopefully I’ll have something figured out soon. Naturally, if the fix happens to be interesting I’ll post details here. Apologies to anybody who has had comments lost on account of this. If you desperately want to comment on something in the mean time, email your comment to me and I’ll happily add it manually for you.

Pyblosxom now running as FastCGI via flup

As I suggested I might a few months back, I have recently got around to running this Pyblosxom blog as a FastCGI process. The increase in speed is really impressive. Unless you happen to visit during a particularly busy period, the first time you hit this site a persistent Pyblosxom process will be started and will then hang around to service your subsequent requests, making them nice and speedy. This is made possible with Allan Saddi’s flup library, which lets Python WSGI applications talk to webservers via FastCGI or SCGI. I’ll go through the details of my setup (which uses lighttpd as its webserver) for the sake of other people who want to do the same

There are basically two steps involved: writing a FastCGI script which wraps the Pyblosxom WSGI app up in flup, and then configuring lighttpd to map a given URL to this script.

Here’s my FastCGI script (with modified paths):

 #!/usr/pkg/bin/python2.4
import sys
import Pyblosxom.pyblosxom
from flup.server.fcgi import WSGIServer

sys.path.insert(0, "/path/to/pyblosxom_config_file")
sys.path.insert(0, "/path/to/pyblosxom/codebase")
app = Pyblosxom.pyblosxom.PyBlosxomWSGIApp()
WSGIServer(app).run()

Pay attention to the shebang line: /usr/pkg/bin/python2.4 will only work on NetBSD systems. /usr/bin/env python should work anywhere and is better form in general. The second call to sys.path.insert is only necessary if you have Pyblosxom itself sitting around in a directory somewhere rather than installed as a Python library in the site-packages directory. All if this is discussed on the Pyblosxom page here, where they write a more or less identical script, minus the use of flup.server.fcgi.WSGIServer.

Here’s the relevant part of my lighttpd.conf:

$HTTP["host"] == "www.luke.maurits.id.au" {
        fastcgi.server = (
                "/blog" => (
                        "script.fcgi" => (
                                "bin-path"              => "/path/to/fastcgiscript.py",
                                "socket"                => "/tmp/pyblosxom.sock",
                                "check-local"   	=> "disable",
                                "disable-time"          => 1,
                                "min-procs"             => 1,
                                "max-procs"             => 32,
                                "max-load-per-proc"     => 4,
                                "idle-timeout"          => 60
                        ),
                ),
        )
}

This maps the URI /blog on the www.luke.maurits.id.au virtual host to to the script we wrote above. When a request for that URI hits the server, lighttpd will run the script and talk to it through the unix domain socket specified (/tmp/pyblosxom.sock in this case). Make sure that, wherever you choose to put your socket, your webserver has read and write permissions. Note also that the path should be relative to any directory that you have lighttpd set to chroot into. This instance of the script will die after 60 seconds (“idle-timeout” above) pass without any requests coming in. Extra instances will be run, up to 32 (“max-procs“) in total, to accommodate heavy load. A new process is created when ever there becomes more than 4 (“max-load-per-proc“) queued requests for each existing process.

As I mentioned, this set up is very noticeably faster than running Pyblosxom as a plain CGI process – as one would fully expect. So far I have had a few problems where a huge number of FastCGI script processes are spawned for no apparent reason, slowing the server to a crawl and resulting in timeouts. I’m not sure what has caused these, though logic dictates it must be a problem with lighttpd. These seem to be over, but if they persist I will have to look into getting something other than lighttpd to span the FastCGI processes. Hopefully this won’t happen. Until it does, enjoy the faster browsing!

Spam, spam, spam, spam…

As you might have noticed, as of a couple of days ago this blog started getting hit pretty heavily by comment spam, composed mostly of links to Russian pornography sites. As of this afternoon, I think I have deleted all of the offending comments. There is a small possibility that I nuked a legitimate comment or two in doing so, but given the currently low frequency of real comments I’m getting, I doubt it. Still, if you left a comment in the last three days you may want to check that it’s still there. If it’s not, email me and I should be able to resurrect it from the notification email.

In an attempt to stop this from happening again, I’ve installed Menno Smits’ “spamquestion” plugin, which relies on Steven Armstrong’s “session” plugin. You now have to give a simple, one word answer to a question like “What is the opposite of hot?” to leave a comment here. The question is randomly selected from a set of about 10. This sort of spam protection isn’t as strong as captchas, because it’s a fairly trivial matter for a spambot to collect all 10 of the questions, have the answers provided by a person, and then spam as usual. However, it’s perfectly adequate to protect against spam which isn’t being individually targeted against your one site (the spam I was getting came from a range of IP addresses, so I’m going to assume it was the work of a botnet) and has the advantage of working in text-based browsers and not disadvantaging visually impaired people. Let’s hope it works here.

Based on my preliminary fiddlings with these plugins, it looks like there is little in the way of graceful handling of incorrect answers to the spam question – the form just gets reloaded with none of your input preserved and no explanatory message. This is obviously unacceptable and I might get around to fixing it myself sometime soon. For now, just be careful!

Pyblosxom Hack Number 1

Here’s my first “pyBlosxom hack”. It’s not really a hack on the
pyBlosxom system
itself, it’s more of a “usage hack”, but I think it’s a relatively
neat one.

Back when this blog was statically rendered, I used to write the
entries in "http://pyblosxom.sourceforge.net/">Markdown, and they were
stayed that one on the file system. The statially rendered HTML
pages were in proper HTML, however, because I used the "http://pyblosxom.sourceforge.net/registry/text/PyMarkdown.html">Markdown
parser
for PyBlosxom. This worked just fine for static
rendering, but when I went dynamic I immediatley realised a huge
problem with this set up. For some reason the Markdown parser is
unbelievably slow. It took literally whole minutes for pyBlosxom to
render the latest 10 entries, which is obviously completely
unacceptable.

I found this quite odd at first, because I write my articles in
Markdown too, and use "http://www.freewisdom.org/projects/python-markdown/">Markdown in
Python
to translate them to HTML. I had always assumed that
PyBlosxom used the same Markdown translation code – afterall, why
would someone code a Python Markdown library if there was already
one out there? But it turns out that in fact this is what’s
happened. The PyBlosxom renderer uses completely different – and
obviously much less efficient – code to Markdown in Python.

The obvious solution to this problem would have been to wrap
Markdown in Python up in whatever interface pyBlosxom uses for
parsers, but I’ve solved it by doing something quite different
which gives me a fairly powerful interface to using pyBlosxom.

I’ve written a python script called makeentry which
does the following:

  • Starts up vi, my editor of choice, editing a temporary
    file in /tmp. I use this editing session to write an entry
    in Markdown. Note that I write just the entry, without the
    metadata that pyBlosxom would usually want at the start, like a
    title or tags.
  • Upon the vi process terminating after I finish writing
    the entry, it starts up "http://aspell.net/">aspell to spell check that
    file.
  • After spell checking the file, it (quickly!) translates the
    Markdown to HTML using Markdown in Python.
  • I then get prompted to entire a title and list of tags.
  • The title, tags and translated HTML entry are then all
    concatenated in the expected way into a file in my pyBlosxom entry
    directory (the filename is automatically generated from the title
    by converting to lowercase and replacing spaces with
    underscores).

This way I still get to write in Markdown, but with the
following benefits over wrapping Markdown in Python up with
pyBlosxom’s parser interface:

  • I get to do do spell checking (indeed, arbitrary
    pre-processing) before publishing my entry.
  • pyBlosxom reads the entry of the disk in HTML, so no time at
    all is consumed doing a translation (which is faster than even the
    fastest Markdown translator possible).

I quite like this usage paradigm. I’m hoping that sometime not
too far off I get the chance to add another level of
pre-processing: Pygments is a
code colouring system (written in Python, of course), which
translates code in just about any modern programming language into
HTML with appropriate span tags to perform syntactic code
colouration. I’d really like it if I could have my
makeentry script search the HTML entry for code
tags nested in pre tags (using "http://docs.python.org/lib/module-HTMLParser.html">HTMLParser
from the Python standard library) and automatically replace the
contents with colourised code using Pygments. This would be pretty
cool and shouldn’t be too hard. Keep an eye out for it in the
nearish future.

Commenting feature added, pyBlosxom headaches

As some of you may have noticed, this blog now features comments. I set this up over last Thursday and Friday. It wasn’t a straight forward procedure, and during these few days you may have encountered various problems with my website – even for parts of it that have nothing to do with the blog, because at one stage the internal URL rewrites that I asked lighttpd to do were stupid ones, owing to my inexperience with regular expressions. I apologise if this caused you trouble, everything should be working fine now.

The entire experience has rather substantially dented my confidence in PyBlosxom as a blogging platform. Certainly, I enjoy its flat-file simplicity and naturally far prefer to be using something written in Python rather than PHP, but the fact that the majority of its functionality is provided by third party plugins of an apparently mediocre quality – and certainly of ephemeral availability – with documentation that varies from non-existent to outright inaccurate does not give me warm fuzzy feelings. I’ll probably write an entry or two about the problems I’ve had in the coming weeks.

I have been entertaining grandiose plans of writing my own Pythonic blogging platform, based on CherryPy (which looks so genuinely fantastic that I can’t wait to use it for something) and Cheetah, which has served me well as part of my current home-brew system for generating this site. I would assuredly stick with flat text files over a database, although I may use an SQLite database for some things if I thought it would afford a significant gain in performance or code simplicity without too much an increase in overall complexity. This is probably a pipe dream anyway, and certainly not something I could throw together in a hurry.

Until such a time comes as I write this imaginary CherryBlog, I think I will slowly devote time to hacking on PyBlosxom in an attempt to make it more usable. I’ll blog about anything half way decent that I come up with.

On a performance note, you may have noticed that the blog pages of this site are now substantially slower to load than they have been in the past. At the moment, PyBlosxom is running as a plain old CGI process, which of course means that the whole thing
is as slow as it possibly could be. But that’s not to say it’s necessarily slow, of course. Under the super light load that this blog is currently getting things are bearable. I do intend to migrate away from CGI at some stage, if I don’t write my won system first – there is a WSGI version of PyBlosxom which I should be able to hook up to lighttpd using Allan Saddi’s flup library and either FastCGI or SCGI. Barry Pederson has provided a starting point for this in his own PyBlosxom blog. I’ve had quite enough of tinkering with this blog for a while, though, so this may not happen for a month or so.