Handling regional variations in language learning

Warning, this post is long and rambling. You have been warned! :P

Part of the design philosophy of my language learning app is to reuse as much as possible. This brings up an interesting issue regarding regional variations of languages (I’m talking mainly about somewhat standardised variations) and how much should be shared between them.

For example in Belgium, French is an official language. This is almost the same as French as spoken in France but with a few important differences. Firstly there are minor vocabulary variations (Belgian French has specific words for 70 and 90 for instance). There is also a lot of Flemish and Walloon vocabulary used in addition to the French vocabulary. Finally there are pronunciation differences but these seem no greater than differences in accent.

So, a course on Belgian French should be almost identical to a course on Standard French. The question is how to notate that in the script files the language app uses.

There are basically three ways I’ve come up with to cope with the situation, and I think I’ll support all of them since they have different advantages in different situations.

The first is to allow in line region specific phrases. So for the numbers in Belgian French, the standard French files would be used but any Belgian French sections would take priority.

The second is to have whole region specific files. Extra Belgian phrases not appearing in standard French would be in these and be loaded in addition to the standard French files. This is really an extension of the first.

The final case is no link at all. This would be needed for Chinese. The language code for Mandarin is “zh-guoyu” and the code for Cantonese “zh-yue”. In this case however there is no such spoken language with the code “zh” and therefore nothing to inherit from. This is an specific case of the first two where no parent language exists.

So far this has just been considering audio. The app already supports text and will eventually support text only lessons of some sort. The first method above could be using for spelling variations (when learning English “color” and “colour” could use the same audio while appearing differently on the screen). As more dramatic example Serbian could be taught using either the Cyrillic alphabet or the Latin alphabet with the codes “sr-cyrl” and “sr-latn” respectively. Or perhaps even both…

The final point I want to make regards the actual audio files themselves. Although it is true than most of spoken French is almost the same in Belgium and France, the accents are different and generally identifiable to French speakers. Therefore regional specific audio is desirable where possible. Since the script files and the audio are kept separate this is is possible with the language app. If the Belgian French audio exists that will be used, if not the standard French is used. That means that if a standard French course is created, an adequate Belgian French course can then be created with little effort but with the possibility of improving it later