Languages

Learning a language over Jabber/XMPP

Oliver Brown
— This upcoming video may not be available to view yet.

The language learning app of mine is still under development, but I doubt I’ll be spending too much time on it until after my trip to America (and by extension, until after Christmas).

In the meantime I’ve reading about XMPP, the protocol used by Google Talk and several others.

One of the extensions that Google is largely responsible for is Jingle which essentially allows voice communications. There are also a couple of XMPP clients written in PHP. So it may be possible to deliver lessons over XMPP through a chat bot.

How to learn a language

Oliver Brown
— This upcoming video may not be available to view yet.

I found an interesting post through Technorati tagged Pimsleur about how to learn a language. And for once it actually seems quite sensible and plausible. It’s also made me think about grammar and how it should be handled in my language learning app.

At the moment it plays the audio at you without anything on the screen. Perhaps the screen could display explanations of interesting or important points about what you hear? I’m worried about distracting people from listening and limiting the offline usability of precompiled lessons though…

Collocation

Oliver Brown
— This upcoming video may not be available to view yet.

Collocation refers to a phrase or small group of words used together in normal speech with restrictions not explicitly imposed by grammar.

Quite an odd concept, but vitally important to language learning. Correct use of collocation is probably the best way to identify a native speaker from a near native speaker. I bring this up now because Julia made a wonderful example of an incorrect collocation for English.

She came up with the phrase “two and a half hundred”. In English you can say “two and half thousand” and “two and half million” but for some reason it doesn’t work with hundreds (it does in Finnish incidentally).

Language learning pricing and making it pay for itself

Oliver Brown
— This upcoming video may not be available to view yet.

If I’m considering paying for voice talent for the language app (which let’s face it, I’m going to have to do) I have to be thinking about getting the money back somehow and I have an interesting idea that essentially equates to everyone helping each other learn languages.

Basically when learning, you pay per phrase (phrase in this context means any named element. Usually a phrase but could also be a specific term). The cost would be something really low (say for example $0.01 each). You only pay for a phrase once regardless of how many times it gets repeated or how many conversations it appears in. Just to provide some sort of concrete example, the material I’m testing with (which covers the first two Pimsleur lessons) has 82* different phrases/terms. Quite a few simple conversations can be put together with that material. As the number of phrases increases, the number of conversations increases exponentially (the mathematician within me has to point out that strictly speaking it’s probably not exponential).

The clever part would be to allow people to upload their own audio. Although this has a few issues with regards to quality it might work. If you upload audio, you get a percentage of the money spent on listening to your audio. What the exact percentage should be is complicated though. As well as audio, the other big part of the system is the scripts. The traditional part of me feels that the scripts should have some sort of professional input from someone with experience teaching the language. Another part of me realises that hundreds of books exist for teaching languages that are written by professionals that are totally useless (and therefore professional input may not be all its cracked up to be). With that in mind, someone fluent in the language may be all that is required. Either way, the script writers need money too and should probably get a percentage as well. That $0.01 is being spread quite thin…

The system obviously needs an infrastructure in place to sort this. At a basic level it should list phrases that are needed in scripts but are missing. Another part would be to highlight underutilised phrases that need more conversations writing for them.

As a final note I’d like to explain another bit of cleverness in the system (and an associated problem, perhaps). Before a conversation is played it is checked for completeness - i.e. do sound files exist for all the required audio. Shortly I’ll be adding another layer to this - checking for sound files by the right people. Phrases in a script are marked by “person”. Simply a way of identifying who is doing the talking when a script involves more than one person (which most of them will do). It’s important that all elements in a script marked “Person1” are by the same person and all the parts by “Person2” are by the same person (and that “Person1” and “Person2” are different from each other). This means that there will be some duplication of audio going on. If someone records all the initial audio, but then never records any more, someone else will have to re-record most of it since it will be used in later scripts. Which also implies there’s nothing to stop people re-recording the initial material in an effort to get a percentage of the money (assuming enough is offered to make it worthwhile). I’m not sure if that really is so bad though - it offers more variety for listeners…

* Or perhaps 164. Whether you should pay for the native and foreign versions is a tough subject. I’d like the application to be independent of a specific language so my preference would be “yes”. They take just as much effort to record after all…

How much fluff is needed?

Oliver Brown
— This upcoming video may not be available to view yet.

I’ve been sorting out exactly what needs recording for the language app (which I finally have an idea for a name for) and I was trying to decide how much extra instructor speech is needed. Situations aren’t described for instance (no “Image an English man sitting next to a French woman”) and you aren’t asked to say things explicitly (“How do you ask someone if they speak English?”). Will this harm the process at all?

The best thing to do perhaps would be to avoid trying to be Pimsleur quite so exactly.

Any voice talent out there?

Oliver Brown
— This upcoming video may not be available to view yet.

Foreign language voice talent needed for the still-unnamed language learning application.

I have a series of phrases I need recording with a total audio time of about five minutes. I need them in as many languages as possible (although if it isn’t English, German or Finnish then I’ll also need them translating - they are really simple by the way). My main requirements are that the recording is good quality and that you are a native or fluent speaker of the language.

After looking around the Internet for a bit I discovered I could technically afford the hourly rate of most voice actors marketing themselves on the Internet - except they all had rather high minimums which made my five minutes very expensive (although I realise five minutes of audio takes more than five minutes of work). As well as money I can also offer a link and a review which has to be worth something (after all people are paying me for links that have nothing to do with the content of the site and presumably think it’s worth it).

If you’re interested, email me with the language(s) you could do and a quote (and preferably a sample but I acknowledge that this approach is hardly targeting professionals).

Multilingual pretty URLs

Oliver Brown
— This upcoming video may not be available to view yet.

There is more and more emphasis on pretty URLs these days. With things like Ruby on Rails around to easily support it and better knowledge and use of things like mod_rewrite the days of horrible query strings is going away (excluding of course the most used websites - search engines). But how do you make your multilingual website have pretty URLs?

My language learning app uses the Zend Framework and so uses pretty URLs by default. I need the interface available in many languages, but then the URLs should be pretty in a localized way.

For example, starting a new Finnish lesson uses the following:

/lesson/new/fi

That would be the new action of the lesson controller with an extra language code parameter of fi.

In German this should be something like:

/lektion/neu/fi

By default this would access the neu action of the lektion controller.

The “simple” solution would be to write lots of controllers that just delegate to the real one. Which is silly. Instead an extra layer has to be added to the routing process some sort of look-up table mapping localized URL fragments with “real” canonical ones. This should be fairly simple with Zend Framework (although I haven’t actually tried yet).

Just an important issue no-one seems to have brought up yet…

Almost ready for a public viewing

Oliver Brown
— This upcoming video may not be available to view yet.

The still unnamed language learning app is almost ready for a first public viewing. I’m just trying to get some audio of some other than myself. Firstly because I don’t like really hearing my own voice (and for this purpose my less than perfect pronunciation is too obvious) and secondly I need at least two people just for it not to be confusing.

In the meantime I thought I’d share an example of the script file I’m using: EntschuldigenSie.xml. It primarily contains English translations although one phrase is done in a few more languages. It does highlight one possible issue. I had to change the German ß to ss. Although Windows seems perfectly fine with Unicode file names (internally it uses Unicode for storage (either UCS2 or UTF-16 - not sure which)) PHP refuses to open them (fopen, file and file_exists for instance just don’t work) and Apache 2 seems to have issues as well. For German there are workarounds but for other languages it will get fiddly. This might not even be a problem on Linux where it will ultimately reside and it only affects file names which only have to give you a rough idea of what’s inside. But still, it’s annoying…

Best bits of the language app are done

Oliver Brown
— This upcoming video may not be available to view yet.

The most important bits of my cool language learning web app are done. Here’s quick overview of how it works.

Everything is split into modules which are XML script files and accompanying audio files. Currently one type of script is supported, a “conversation”. This contains a short (less than 10 sentences) conversation with sub elements all marked up in XML. Sub elements are phrases, terms and notes. At the moment phrases and terms are handled almost identically. Notes are little explanations or possible stumbling points (for example the test script I have alerts the listener to the difference in the ending between “Ich verstehe” and “Sie verstehe_n_” in German). Any element of a conversation that is to be repeated is named (literally - the XML tag is given a name attribute). The system keeps track of the number of times a name phrase/term is played to the user and when it was last played so the automatic repetition system can work.

A lesson is currently very simple. A module is loaded and the conversation is played straight through. Then the named phrases/terms are played* with translations. Then any phrases/terms scheduled for repetition are played*. The repetitions are actually determined before the conversation is played however so that if too many are required then no new conversation is played.

* Played in this case means a specific format. First the native version is played, then a pause, then the translation is played twice.

EVE Online and Pimsleur

Oliver Brown
— This upcoming video may not be available to view yet.

Pimsleur makes the perfect companion to EVE Online, especially when you’re doing cargo runs or other things that don’t meed much concentration. Why not learn Spanish nipping out to buy that new Battleship? :P

I’m not sure if you can alter the in game music that EVE uses (and changing it to Pimsleur would probably be a hassle since you’re only going to play each track once) but you can run EVE in a window and therefore use whatever media player you like in the background.

EVE, EVE Online, Spanish, Pimsleur