I’m back from my first ever scientific conference, SFCM 2011 in Zurich. My top two favourite talks were Lauri Karttunen’s keynote, Beyond Morphology – Pattern Matching with FST and Non-canonical inflection by Benoît Sagot and Géraldine Walther. An honourable mention goes to Morphology to the Rescue Redux: Resolving Borrowings and Code-mixing in Machine Translation by Esmé Manandise and Claudia Gdaniec. I demoed stuff for our HFST3 paper.
Karttunen presented some obvious-in-retrospect extensions to FST matching, rewriting and tagging and an implementation thereof in an algorithm/utility called pmatch. It’s mostly a combination of recursive transition networks and the insight that with some algorithmic trickery, it’s sufficient to match the end of a subpattern when you want to do left-to-right longest-match matching/tagging. The extensions he described most were
- EndTag(), which is a command that gets compiled into special instructions for pmatch to wrap a pattern or subpattern in tags without the need to produce a transducer that’s always trying to output the start tag and enter failing transitions of the subpattern network, and
- Ins(), which in RTN-style refers to a separate network to be pseudo-inserted at the current location.
These are achieved with flag diacritic -style special symbols, although pmatch itself doesn’t support flag diacritics. Hopefully we’ll have all this functionality in HFST one day, alongside flag-induced hyperminimization – an interesting topic I should write about one day. Put together, these techniques should significantly remedy the problems of networks becoming combinatorically huge in certain situations.
Intermission
For the benefit of people who aren’t interested in computational morphology, here’s some travel stuff.
I’m not a big fan of travel, and was reminded why by almost everything going wrong. My flight was cancelled, and I had to queue for ages to be rerouted via Brussels, and almost missed that flight as well. All told, it took me over 10 hours to get from my house to the hotel in Zurich, leaving less time than I’d hoped to prepare for the demonstration session. And everything was sucky and expensive and my feet hurt and it’s just not worth it to ever leave home :(
Also, Blue1 is a terrible airline company and Swiss is nice (you get free chocolate).
Switzerland is about as orderly, clean and organized as you might imagine. A while ago a Japanese post-doc at the math department was leaving Helsinki to go to do math at an American university, and he sent a nice going-away email to people he’d met in Finland. He wrote “Finland is the 2nd most well-organized country among the places I have ever been (unfortunately you could not beat Japan, sorry!)” – I think he must have missed out on Switzerland.
Famous Swiss hospitality
(That said, there were definitely more representatives of ethnic minorities than in, say, Helsinki.)
The Swiss don’t mess around; each and every lamppost had a sticker like this:
Does it work?
I never saw a single extraneous piece of paper on these things.
Also, a little-known fact: Swiss people are in fact made out of polished steel.
I like the place. These guys know how to live.
End of intermission
Benoît and Géraldine had done work on a system for compactly describing certain irregular (“non-canonical”) phenomena in inflection:
- suppletion (where some forms have an alternate stem or affixes)
- heteroclesis (where some words have a mixed paradigm from several regular forms)
- defectiveness (where certain forms are missing from the paradigm)
- overabundance (where some forms have more than one realisation)
- depondency (where certain words inherit part of another’s paradigm in the “wrong” context, eg. singular suffixation for expressing plural in some Croatian nouns)
They had used their approach to describe French irregular verbs, and also implemented several other well-known descriptions by French linguists. They wanted to show that their approach was best or most natural (at least most compact), and did so by estimating the Kolmogorov complexity of these schemes. This is something I’ve often thought about doing (examining linguistic theories by implementing them), so I’m happy that work is happening in this area.
Overall, SFCM was damn well organized, interesting, motivating and fun to attend – many thanks to the organizers, speakers and attendees!