Posts tagged ‘phd’

AMPC 09 and a renovated research page

As mentioned in a previous entry, I recently spend some time in Newcastle. The reason for this was to attend the 2009 Australasian Mathematical Psychology Conference – my first conference since beginning my PhD! I had a really good time there. A lot of the material that people spoke about was somewhat opaque to me with my complete lack of background in psychology, but at the same time it was quite encouraging to get a sense for the extent to which people in Australia are doing good quantitative work on psychology. I came back with a lot of techniques and buzzwords underlined in my notes that I need to get up to speed with! I gave a talk myself which went relatively well. In hindsight I probably tried to pack more material into 20 minutes than is readily manageable, but it was hardly a disaster and at least now a lot more people know that I and my work exist.

It’s really been a very long time since I blogged anything regarding my research. In fact, this is the first I’ve written about it that is anything more than preliminary mental wanderings that I had before I had actually accumulated any sort of knowledge about the field. This has not been because due to a lack of work or ideas worth writing about. Far from it! It has been mostly because I have wanted to avoid alienating non-specialist readers by talking about concepts which I have not properly introduced, but at the same time have struggled to find the time and motivation to write a good series of grounding posts introducing the relevant concepts and paradigms.

Upon getting back from AMPC09 I reasoned that for the next week or two there would be a slight increase above the baseline probability of fellow academics working in language acquisition (or computational cognitive science in general) visiting my website, and so I made some quick effort to tidy up my research page somewhat. As a consequence of this, I can now point interested readers to my informal research overview] to get a rough idea of just what kind of work I do. You’ll also find on my research page copies of a paper I currently have under review for the 31st annual meeting of the cognitive science society (which is happening in Amsterdam in July) and of the presentation slides I used for my talk at AMPC09.

For better or worse, I am going to consider the availability of all of this new material sufficient for me to start blogging about my work again. Hopefully I can manage to keep my entries non-technical enough to be accessible but still interesting!

Some Thoughts on Language Processing Algorithms

My approach to understanding natural language is what I imagine is the approach taken by most materialist scientists – that the brain is a computer made of meat and in trying to understand things like acquisition of language we are really searching for the algorithms implemented in this meat which achieve this task.

The problem of confirming that a given algorithm actually bears some resemblance to that running in our brain is an interesting one – not strictly necessary if we’re only interested in cool applications like talking computers (in which case the performance of the algorithm is about all we’re interested in), but probably deserving of attention if we’re operating under some pretense of being psychologists, which I suppose I am now (though I don’t like to think of it that way because I still haven’t yet had complete success in cleansing the word “psychologist” of the stigma of pseudoscience that it carries in my mind).

An obvious approach is to implement the algorithm in silicon rather than meat and get it to perform various tasks, making as many observations as possible about its performance and comparing these to similar measurements made on humans using the meaty algorithm. There’s a wide range of observations that could be used here (for example, some measure of susceptibility to linguistic “slip ups”, like spoonerisms) and I expect a lot of thought could be devoted to determining which tests are the most appropriate and reliable along these lines – a kind of “psycholinguistic Voight-Kampff test” which, rather than aiming to determine whether or not a machine can understand and converse in a way which is similar to humans on the surface, like a Turing test, aims to determine whether or not that machine is understanding and conversing in a way similar to humans “under the hood”.

But before we can even get to the stage where we could perform such testing, we need an algorithm to test, and I wonder if a lot of effort might not be saved by designing our algorithms from the outset to have a better chance of resembling the brain’s natural algorithms. The motivating question here is “What can we deduce about the brain’s language processing algorithms from the knowledge that they have been hard-coded into an organic organ by evolutionary forces?”. I’m a little bit out of my league here, having no real background in evolutionary biology or neurophysiology (which may well not even be a word), but studying maths gives you a fantastic arrogance when it comes to feeling qualified to talk about other people’s disciplines (after all, biology is just applied chemistry, which is just applied physics, which is just applied maths. Right?).

I have three somewhat solid thoughts on this front at the moment, both stemming from the idea that the brain, like most (all?) organic organs probably displays a high level of self-similarity, i.e. has the property, or is composed of sub-parts which have the property, of containing lots of copies of a similar sub-structure. This tendency is a pretty obvious and natural consequence of organs growing via a process of repeated cell division. So what does this self-similarity suggest?

Parallelism. Some algorithms are highly susceptible to being made to run in parallel, with linear or sometimes even super-linear speed up achievable, whereas some algorithms really are inherently very serial. It seems natural to expect that the brain is much more likely to be running parallel algorithms (with similar activity happening in several similarly structured parts of the brain), so perhaps we ought to cast some doubt over any language processing algorithm which seems hard to parallelise.

Recursion and iteration. The more recursion and iteration involved in an algorithm, the less need there is in a meat implementation for different pieces of meat which do different things. If we are supposing that evolution will tend to produce a lot of similar brain parts than a wide range of unique brain parts, then perhaps we ought to case some doubt over any language processing algorithm which does not contain a lot of recursion or iteration. This particular “restriction” (really more of an intuitive guide, I guess) puts the apparent current trend toward using Bayesian statistics in cognitive modelling in a good light, because the Bayesian paradigm is really all about iteration, in the sense of constantly updating our prior probability distribution in response to observations.

Sharing of data structures. There is more than one computational task in language processing. Sometimes we’re trying to translate a string of words into a logical relation between concepts and sometimes we’re trying to translate in the other direction. Obviously there are some data storage and searching issues related here – we need to store words, concepts and some sorts of mappings between them. Thus there are data structures involved here – not necessarily perfect analogues of the data structures one meets in a CS course (I doubt our brain uses literal hash tables, for instance), but data structures never the less. Presumably this data is stored in our brain only once, in one particular fashion. Thus, if you have one algorithm for translating in one direction and another for translating in the other, but they both use different data structures to represent concepts, words or the links between them, then regardless of how well the algorithms perform, perhaps we ought to suspect that at least one of them is not an accurate model of how the human brain actually
works.

Of course, it would be very foolish to interpret these as hard and fast guidelines, and I don’t mean to suggest that I will constrain my own studies only to algorithms fitting these criteria. But the very act of coming up with such a list is an interesting and, in my opinion, worthwhile exercise. I would be surprised if all three of these ideas were substantially wrong, and would advise that they at least be kept in mind while designing language processing algorithms that are supposed to mimic actual human language processing.

Why study language?

I’m more or less settled in at the university now and working four days a week on my PhD. The room that houses my office has only just recently had some renovations finished and it’s not exactly completely set up yet. It’s also so far below ground level that
there is absolutely zero mobile phone reception, which might be something of a pain, but which is also pretty hard to do anything about. Anyway, expect the entries in this blog to start revolving around what I’ve decided to tentatively declare “computational psycholinguistics” in the near future. And in that vein…

Why study human language? Three reasons stand out in particular for me.

  1. Language is something that is reasonably tractable by mathematical and scientific methodology. A lot of what goes on in psychology verges, in my opinion, on being pseudo-scientific rubbish. Any study, for instance, which revolves around things like one’s perception of oneself, or feelings or anything like that is immediately confronted with the fairly insurmountable problem that we can’t even precisely define these things, let alone measure them or model them. We don’t even properly understand conceptually simpler things, like memory, on which these grandiose ideas must surely depend. These psychologists are, metaphorically speaking, trying to fly to the moon before they’ve fully learned Newton’s laws of motion.

    I think language is in a different situation. It’s fairly easily to define what language, at its heart, is all about. We have two finite sets – one of words and one of concepts – and language is about mapping back and forward between finite sequences of words (more commonly known as “sentences” in the written case and, apparently, “utterances” in the spoken case) and logical relations between these concepts (which we might well call “ideas”). That’s what it is. Learning a language is nothing more
    than learning this mapping. This is perhaps an oversimplification – from a language perspective, we’ve side-stepped the issue of building words up from heard phonemes or seen morphemes, and from a mathematical perspective it’s true that were not really concerned with a mapping, but rather a relation because one sentence can conceivably have more than one possible interpretation – but it certainly captures the essence of the problem and puts it in an entirely tractable form: finite sequences of elements from finite sets are not mysterious, ephemeral, intuitive things – they’re rigorously defined and well studied entities. We can do statistical analysis on them, we can define equivalence classes on them and we can generate them using stochastic or deterministic processes.  Logical relations between concepts are nothing new or “squishy” either, and we can use things like predicate calculus to model them.

    In short, the study of language is firmly grounded in objective reality, thus letting one investigate the human mind – certainly an appealing area of study – without sacrificing one’s scientific integrity.

  2. Language is surprisingly fundamental to human cognition.  Although it’s not initially clear under casual consideration, I think that, when you think about it, it becomes an inescapable conclusion that language is inherently tied up – and very deeply so – with how humans form and internally represent arbitrary and often quite abstract concepts and categories. After all, we’re mapping back and forward between sentences or utterances and relations between concepts. The nature of these concepts and their initial formation, internal representation and long term storage can hardly be irrelevant. Sometimes when we map from a linguistic input into the conceptual “idea space”, the resulting idea has the long term affect of modifying the way we perform these mappings in future – i.e. when we are explicitly taught a new word.
  3. Language has some really cool applications in areas that I’m interested in. Better understanding of how humans understand and generate natural language can lead directly (thanks largely to point 1, i.e. that it’s understanding of something tangible) to giving computers better ability to do things like:
    • translate between human languages,
    • search the web,
    • automatically generate RDF triples for the semantic web,
    • intelligently aggregate related items from the overwhelming forest of online news sources and/or blogs
    • communicate with users in a more natural manner using speech and/or language recognition and synthesis.

    These sorts of applications are, I think, likely to fairly strongly influence the direction of my research, especially those related to the web.

So that’s it! Some of my reasoning behind devoting the next 3 years of my life largely to the study of human natural language processing.

Research explained

In my last entry I said I’d explain the research page page which mysteriously appeared during my site redesign. Here’s the story.

My first job at m.Net Corporation was basically to refine and extend some work done as part of a joint research project between m.Net and a research psychologist from my alma mater.  This psychologist was Daniel Navarro, an insanely smart guy who, despite being psychologist, actually understands things like maths and statistics and can even write code (though be fair his code sometimes sucks).

Working together we had moderate success in adapting latent Dirichlet allocation, a mathematical model originally developed for natural language processing, to a collaborative filtering problem as part of m.Net’s customer analytics research. It was pretty cool stuff, and I learned a lot. I was genuinely surprised and excited to realise that some psychologists actually do things like heavy Bayesian statistics and intense number crunching, instead of just blindly assuming that all the world’s data is normally distributed and interpreting simple linear regression as the Word of God (which is what mathematicians generally assume psychologists spend all of their time doing – it’s a reputation they deserve for teaching their students from a book called Statistics Without Maths for Psychology. I mean, really). Check out MIT’s Computational Cognitive Science Group for a better idea of the cool kind of stuff some people do. Anyway, about a month ago Dan mentioned to me in passing that an internal PhD scholarship in the School of Psychology may be about to become available, and suggested that if I were interested he could try to convince them to let me apply for it, on the grounds that teaching a mathematician the basics of psychology is about 100 times easier than teaching a psychologist the basics of mathematics and hence recruiting mathematicians is actually a smarter way to produce good research in mathematical psychology. I said I was interested, because I really did find the LDA work I did interesting, and he said he’d try but that I shouldn’t get my hopes up because it was a long shot. So I didn’t. Fast forward to earlier this week and the last bit of paperwork has gone through and the scholarship is mine. Sometime before the end of the month I’ll be starting a PhD, with Dan as my supervisor. The topic of study has not yet been finally decided, but it will revolve in someway around how humans firstly learn and subsequently understand language (and these are obviously related problems, considering the way that advanced language is typically learned via explanation using simpler language) and involve as much mathematical modelling and number crunching as I can possibly squeeze into it. I’m very excited about possible applications of these models, to things like improving the “intelligence” of tools like search engines and news aggregators and, perhaps more ambitiously, using software to “bootstrap” the semantic web by auto-generating RDF files en masse.

So you can expect any papers or the like that I write in the course of this PhD to appear on my research page, any software I write as part of it to appear on my software page (under a BSD license, of course), and occasional thoughts to appear in this blog.

This doesn’t explain the fact that all of this is happening with probability 0.5. I’ll leave that for another entry.

Oh, and I am trying to arrange to stay on at m.Net for one week a day during the PhD, because it’s an awesome place to work and I’d be genuinely sad to leave for good.