Computers, Languages, Programming, Technology, XML

Handling regional variations in language learning

Warning, this post is long and rambling. You have been warned! 😛

Part of the design philosophy of my language learning app is to reuse as much as possible. This brings up an interesting issue regarding regional variations of languages (I’m talking mainly about somewhat standardised variations) and how much should be shared between them.

For example in Belgium, French is an official language. This is almost the same as French as spoken in France but with a few important differences. Firstly there are minor vocabulary variations (Belgian French has specific words for 70 and 90 for instance). There is also a lot of Flemish and Walloon vocabulary used in addition to the French vocabulary. Finally there are pronunciation differences but these seem no greater than differences in accent.

So, a course on Belgian French should be almost identical to a course on Standard French. The question is how to notate that in the script files the language app uses.

There are basically three ways I’ve come up with to cope with the situation, and I think I’ll support all of them since they have different advantages in different situations.

The first is to allow in line region specific phrases. So for the numbers in Belgian French, the standard French files would be used but any Belgian French sections would take priority.

The second is to have whole region specific files. Extra Belgian phrases not appearing in standard French would be in these and be loaded in addition to the standard French files. This is really an extension of the first.

The final case is no link at all. This would be needed for Chinese. The language code for Mandarin is “zh-guoyu” and the code for Cantonese “zh-yue”. In this case however there is no such spoken language with the code “zh” and therefore nothing to inherit from. This is an specific case of the first two where no parent language exists.

So far this has just been considering audio. The app already supports text and will eventually support text only lessons of some sort. The first method above could be using for spelling variations (when learning English “color” and “colour” could use the same audio while appearing differently on the screen). As more dramatic example Serbian could be taught using either the Cyrillic alphabet or the Latin alphabet with the codes “sr-cyrl” and “sr-latn” respectively. Or perhaps even both…

The final point I want to make regards the actual audio files themselves. Although it is true than most of spoken French is almost the same in Belgium and France, the accents are different and generally identifiable to French speakers. Therefore regional specific audio is desirable where possible. Since the script files and the audio are kept separate this is is possible with the language app. If the Belgian French audio exists that will be used, if not the standard French is used. That means that if a standard French course is created, an adequate Belgian French course can then be created with little effort but with the possibility of improving it later

Computers, Languages, Programming, Technology, XML

So much for Gtk#…

Well I’ve abandoned my plans to use Gtk# in the language app (which actually secretly has a name now).

The main reason for changing is simplicity. I had a look at the TreeView control in Gtk and decided it was too much work. Although the theory of good MVC separation is good, the user interface is such a small, simple part of my app it wasn’t worth it. The stuff I need from System.Windows.Forms should work in Mono (and .NET 1.1 and hopefully even the Compact Framework).

I still prefer the way Gtk handles layout of controls in general, but I console myself with the Windows form designer in Visual C# Express…

Computers, Programming, Technology, Web Programming, XML

Using XHTML, XSLT and XForms for Xemplorary performance

Alliteration and bad pun. Good start 🙂

One of the features the language app will need is some sort of module editor. Although the XML format of the scripts is straightforward to anyone used to hand editing HTML, a lot of other people will not have a clue. Therefore a WYSIWIG would be a cool addition. And lots of X’s may be the way to go.

Although XForm support in browsers isn’t exactly stellar, the fact that only script editors will require means that needing a plug-in or extension isn’t such a big thing. And I get brownie points for being Web 2.0 as well.

I’m going to assume you know what XForms and XSLT are. If you don’t, then go find out. I’ll probably explain in a future post, but for now just accept them as “cool” 😛

Basically a module is included directly into the XHTML source of the page. The only change is the addition of a namespace declaration (which are normally absent from the modules). XSLT is then used to add some nice formatting to the conversation along with XForm stuff for editing (including adding/removing elements). This makes the server side code really easy since the whole XML of the module gets posted back to the server.

In theory the XSLT shouldn’t be needed since XForms can do repeating and stuff. The only problem is I don’t think it can handle recursion which is a bit of a limitation.

There is one bit of the XSL that I’m stuck on there. I have the XML fragment in the head of the XHTML document. I need to be able to transform a copy of it and place it in the document body, but keep the original intact in the head. Does anyone have an XSL snippet to do that?

XForms, XSL, XSLT, Web2, Web 2.0, AJAX, HTML, XHTML, CSS

Computers, Languages, PHP, Programming, Technology, Web Programming, XML

Almost ready for a public viewing

The still unnamed language learning app is almost ready for a first public viewing. I’m just trying to get some audio of some other than myself. Firstly because I don’t like really hearing my own voice (and for this purpose my less than perfect pronunciation is too obvious) and secondly I need at least two people just for it not to be confusing.

In the meantime I thought I’d share an example of the script file I’m using:


It primarily contains English translations although one phrase is done in a few more languages.

It does highlight one possible issue. I had to change the German ß to ss. Although Windows seems perfectly fine with Unicode file names (internally it uses Unicode for storage (either UCS2 or UTF-16 – not sure which)) PHP refuses to open them (fopen, file and file_exists for instance just don’t work) and Apache 2 seems to have issues as well. For German there are workarounds but for other languages it will get fiddly. This might not even be a problem on Linux where it will ultimately reside and it only affects file names which only have to give you a rough idea of what’s inside. But still, it’s annoying…

Pimsleur, German, Windows, Apache, Unicode, UTF-16

Computers, PHP, Programming, Ruby on Rails, Technology, Web Programming, XML

Zend Framework

<![CDATA[Zend, the commercial endeavour of the people who brought you PHP have a produced a framework, cleverly called the "Zend Framework". It's basically a lightweight MVC framework for PHP. Lightweight in this case is good. It doesn't do as much as Rails does for Ruby (although it is significantly younger) – the most notable hole is a object-relational-mapping system. But it does provide URL rewriting for Rails-esque view/controller access.

I started writing my clever language thingy in it.

The biggest problem I had was getting it to work with IIS. Which I couldn’t. I decided since I had IIS installed I’d give it a go. Unfortunately you require mod_rewrite which IIS doesn’t have. So I installed ISAPI_rewrite, a version for IIS. After an hour of trying to get it to work I went and downloaded Apache 2.2. Which was my second mistake You see it seems PHP doesn’t work with Apache 2.2. Not sure why but I found a vague mention of it on a forum after trying for another hour to get it to work. So I got Apache 2.0 and everything worked.

Of course there are reasons not to use PHP 5 with Apache 2, but meh.

There is one little problem with the Zend Framework, I think. It seems to be printing a space somewhere before any other output. It wouldn’t be a problem except I need it to output XML and a space at the beginning makes Firefox (and probably Internet Explorer) explode.

Apache, IIS, Zend, Zend Framework, MVC]]>

Computers, PHP, Programming, Technology, Web Programming, XML

Back to language learning

<![CDATA[After spending a couple of weeks dealing with foreign text and Unicode at work, my interest in foreign language learning with the aid of a computer has returned.

My main goal is a Pimsleur style system but with the repetition handled by computer – i.e. with just the individual phrases (and words and syllables for earlier lessons) as audio files, the program should generate complete conversations with sensible parts repeated and useful instructor comments in between.

That sounds like it requires some sort of script in some sort of markup language. Since it needs to be highly structured I guess that only leaves XML as a sensible possibility. So I marked up a conversation from Pimsleur’s German I. There was an unexpected result. It’s fairly straight forward to have multiple source languages in one script file. Although there are certain things that would not work best this way, a lot of things in German (for instance) would be taught the same regardless of what language you are learning from. Ultimately source-language-specific scripts would have to be supported though.

Pimsleur, foreign language learning, German]]>