Learn a language with synthetic speech

I’m currently working out how feasible computer based language-learning-type software using synthesised speech would be. Speech synthesis has improved a great deal recently and although it’s still not as good as a real person (although I believe it could be soon - at least for non-emotive situation) it could just good enough.

The traditional way to generate speech with a computer is algorithmically. Essentially someone works out how to overlay tones with different pitches and wave-shapes to form each sound. The newer way is to actually record each sound manually and essentially play them back one after the other.

There are more stages it to it than that - assuming you don’t want to write the speech phonetically (in IPA for instance) there also needs to be a way of turning text into phonetic information. This is usually half dictionary based for common words and syllables and half rule based (to avoid having a big dictionary and for coping with languages constantly expanding and evolving).

So we now have technology (almost freely available) that can produce speech that is good enough given the correct phonetic information - it’s the actualy language processing that is problematic. Most of the work is done by American companies and therefore most of the work is done processing English (American English at that).

This is not an insurmountable problem. The engine I’ve been playing with (available as an addin to Internet Explorer and as standard on Windows Vista) works fairly well with foreign words transcribed in dodgy-phonetic English. For example to get it to pronounce “Entshuldigung” (German) correctly you need to type “Enshooldicken”). This is workable for an semi-automated system - it could include a dictionary of sorts replacing words with their English-phonetics version.

I know the whole of this article is rather rambling - I’ll post something more readable later :P