XML

Google Docs rule - if you use them right

Oliver Brown
— This upcoming video may not be available to view yet.

I’ve been vaguely using Google Docs (specifically Spreadsheets) since it came out but never to do anything actually important. Most of the time I just had a list I need sorting, or if I was feeling sophisticated I’d use it to decide on what was best value for money (how much £/GB a range of hard drives were for instance). Recently I started using it to plan lessons for the language learning app. The ability to use it from work (or any other computer I might be on - including viewing it on my Nokia 770) was useful, but in the end I was only really writing a list with it.

Until now. I now have a nifty little C# app that generates modules directly from a Google Spreadsheet which is definitely a Good Thing. I’ve been thinking of writing an app for module editing for a while since writing them by hand is tiresome and error prone. Google Spreadsheets does half the work for me by providing the user interface for generating a table and then provides access as simple XML. Which brings me to the matter of actually accessing the data. Google provide a client library in C# for accessing quite a lot of their API. I tried using it but found it a little confusing. Luckily since I was just wanting to query data, I discovered that raw access was actually easier. You simply make a GET request to http://spreadsheets.google.com/feeds/worksheets/_key_/public/values (where key is provided to you when you “publish” a spreadsheet - access to unpublished spreadsheets requires authorization which is more complicated). This gives you an Atom feed of URLs to the individual worksheets which them contain Atom feeds of either rows or columns (your choice). The query power of LINQ (along with XElement, XAttribute etc.) make transforming the feeds into modules really easy. In fact the code that does the hard work (takes a spreadsheet key and generates the XML) is only 102 lines long, and that’s including unnecessary spacing to make the LINQ more readable (the main LINQ query is 35 lines).

Handling regional variations in language learning

Oliver Brown
— This upcoming video may not be available to view yet.

Warning, this post is long and rambling. You have been warned! :P

Part of the design philosophy of my language learning app is to reuse as much as possible. This brings up an interesting issue regarding regional variations of languages (I’m talking mainly about somewhat standardised variations) and how much should be shared between them.

For example in Belgium, French is an official language. This is almost the same as French as spoken in France but with a few important differences. Firstly there are minor vocabulary variations (Belgian French has specific words for 70 and 90 for instance). There is also a lot of Flemish and Walloon vocabulary used in addition to the French vocabulary. Finally there are pronunciation differences but these seem no greater than differences in accent.

So, a course on Belgian French should be almost identical to a course on Standard French. The question is how to notate that in the script files the language app uses.

There are basically three ways I’ve come up with to cope with the situation, and I think I’ll support all of them since they have different advantages in different situations.

The first is to allow in line region specific phrases. So for the numbers in Belgian French, the standard French files would be used but any Belgian French sections would take priority.

The second is to have whole region specific files. Extra Belgian phrases not appearing in standard French would be in these and be loaded in addition to the standard French files. This is really an extension of the first.

The final case is no link at all. This would be needed for Chinese. The language code for Mandarin is “zh-guoyu” and the code for Cantonese “zh-yue”. In this case however there is no such spoken language with the code “zh” and therefore nothing to inherit from. This is an specific case of the first two where no parent language exists.

So far this has just been considering audio. The app already supports text and will eventually support text only lessons of some sort. The first method above could be using for spelling variations (when learning English “color” and “colour” could use the same audio while appearing differently on the screen). As more dramatic example Serbian could be taught using either the Cyrillic alphabet or the Latin alphabet with the codes “sr-cyrl” and “sr-latn” respectively. Or perhaps even both…

The final point I want to make regards the actual audio files themselves. Although it is true than most of spoken French is almost the same in Belgium and France, the accents are different and generally identifiable to French speakers. Therefore regional specific audio is desirable where possible. Since the script files and the audio are kept separate this is is possible with the language app. If the Belgian French audio exists that will be used, if not the standard French is used. That means that if a standard French course is created, an adequate Belgian French course can then be created with little effort but with the possibility of improving it later

So much for Gtk#…

Oliver Brown
— This upcoming video may not be available to view yet.

Well I’ve abandoned my plans to use Gtk# in the language app (which actually secretly has a name now).

The main reason for changing is simplicity. I had a look at the TreeView control in Gtk and decided it was too much work. Although the theory of good MVC separation is good, the user interface is such a small, simple part of my app it wasn’t worth it. The stuff I need from System.Windows.Forms should work in Mono (and .NET 1.1 and hopefully even the Compact Framework).

I still prefer the way Gtk handles layout of controls in general, but I console myself with the Windows form designer in Visual C# Express.

Using XHTML, XSLT and XForms for Xemplorary performance

Oliver Brown
— This upcoming video may not be available to view yet.

Alliteration and bad pun. Good start :)

One of the features the language app will need is some sort of module editor. Although the XML format of the scripts is straightforward to anyone used to hand editing HTML, a lot of other people will not have a clue. Therefore a WYSIWIG would be a cool addition. And lots of X’s may be the way to go.

Although XForm support in browsers isn’t exactly stellar, the fact that only script editors will require means that needing a plug-in or extension isn’t such a big thing. And I get brownie points for being Web 2.0 as well.

I’m going to assume you know what XForms and XSLT are. If you don’t, then go find out. I’ll probably explain in a future post, but for now just accept them as “cool” :P

Basically a module is included directly into the XHTML source of the page. The only change is the addition of a namespace declaration (which are normally absent from the modules). XSLT is then used to add some nice formatting to the conversation along with XForm stuff for editing (including adding/removing elements). This makes the server side code really easy since the whole XML of the module gets posted back to the server.

In theory the XSLT shouldn’t be needed since XForms can do repeating and stuff. The only problem is I don’t think it can handle recursion which is a bit of a limitation.

There is one bit of the XSL that I’m stuck on there. I have the XML fragment in the head of the XHTML document. I need to be able to transform a copy of it and place it in the document body, but keep the original intact in the head. Does anyone have an XSL snippet to do that?

Almost ready for a public viewing

Oliver Brown
— This upcoming video may not be available to view yet.

The still unnamed language learning app is almost ready for a first public viewing. I’m just trying to get some audio of some other than myself. Firstly because I don’t like really hearing my own voice (and for this purpose my less than perfect pronunciation is too obvious) and secondly I need at least two people just for it not to be confusing.

In the meantime I thought I’d share an example of the script file I’m using: EntschuldigenSie.xml. It primarily contains English translations although one phrase is done in a few more languages. It does highlight one possible issue. I had to change the German ß to ss. Although Windows seems perfectly fine with Unicode file names (internally it uses Unicode for storage (either UCS2 or UTF-16 - not sure which)) PHP refuses to open them (fopen, file and file_exists for instance just don’t work) and Apache 2 seems to have issues as well. For German there are workarounds but for other languages it will get fiddly. This might not even be a problem on Linux where it will ultimately reside and it only affects file names which only have to give you a rough idea of what’s inside. But still, it’s annoying…

Zend Framework

Oliver Brown
— This upcoming video may not be available to view yet.

Zend, the commercial endeavour of the people who brought you PHP have a produced a framework, cleverly called the “Zend Framework”.

It’s basically a lightweight MVC framework for PHP. Lightweight in this case is good. It doesn’t do as much as Rails does for Ruby (although it is significantly younger) - the most notable hole is a object-relational-mapping system. But it does provide URL rewriting for Rails-esque view/controller access. I started writing my clever language thingy in it.

The biggest problem I had was getting it to work with IIS. Which I couldn’t. I decided since I had IIS installed I’d give it a go. Unfortunately you require mod_rewrite which IIS doesn’t have. So I installed ISAPI_rewrite, a version for IIS. After an hour of trying to get it to work I went and downloaded Apache 2.2. Which was my second mistake You see it seems PHP doesn’t work with Apache 2.2. Not sure why but I found a vague mention of it on a forum after trying for another hour to get it to work. So I got Apache 2.0 and everything worked. Of course there are reasons not to use PHP 5 with Apache 2, but meh.

There is one little problem with the Zend Framework, I think. It seems to be printing a space somewhere before any other output. It wouldn’t be a problem except I need it to output XML and a space at the beginning makes Firefox (and probably Internet Explorer) explode.

Back to language learning

Oliver Brown
— This upcoming video may not be available to view yet.

After spending a couple of weeks dealing with foreign text and Unicode at work, my interest in foreign language learning with the aid of a computer has returned.

My main goal is a Pimsleur style system but with the repetition handled by computer - i.e. with just the individual phrases (and words and syllables for earlier lessons) as audio files, the program should generate complete conversations with sensible parts repeated and useful instructor comments in between. That sounds like it requires some sort of script in some sort of markup language. Since it needs to be highly structured I guess that only leaves XML as a sensible possibility. So I marked up a conversation from Pimsleur’s German I.

There was an unexpected result. It’s fairly straight forward to have multiple source languages in one script file. Although there are certain things that would not work best this way, a lot of things in German (for instance) would be taught the same regardless of what language you are learning from. Ultimately source-language-specific scripts would have to be supported though.

Yet another XML based AJAX toolkit

Oliver Brown
— This upcoming video may not be available to view yet.

Jitsu is another AJAX toolkit.

Like Backbase and Atlas it supports an XML based declarative format that is parsed by JavaScript and converted into real HTML.

This one is open source and free.

ASP.NET Atlas really is like Backbase

Oliver Brown
— This upcoming video may not be available to view yet.

It turns out that ASP.NET might not suck after all. Atlas for ASP.NET is a toolkit for doing AJAXy stuff.

Well in fact it is quite a bit more than that. It has many features of the Google Web Toolkit (except in ASP.NET instead of Java) including serializing server side objects for use client side use. Interesting it also has a lot in common with Backbase. It allows you to embed some nifty XML to define a user interface which is then interpreted by the Javascript to render real (X)HTML.

The final irony is that it’s pretty much free. Since it’s .NET, to really use it you need Visual Studio, but the Atlas part itself is free and should be perfectly usable with the Express version of the Visual Studio projects.

PSP browser support

Oliver Brown
— This upcoming video may not be available to view yet.

With my broadband connection came a wireless network. So I tried browsing with my PSP. And it is a lot better than I expected. Except when browsing my own blog :(

I figured the easiest way to make it work was to send it the XHTML Basic version. So you should now be able to browse my site with a PSP without any hassle :D

Detecting the PSP browser**

Detecting a PSP is really easy. It sends a custom HTTP header: HTTP_X_PSP_BROWSER which contains the firmware version. Just check if that header is set. In PHP you just need to do:

if (isset($_SERVER['HTTP_X_PSP_BROWSER'])) $psp = true;