Status of Kannada Computing on the Internet


This is an attempt to survey the state of the art technology in computing using the Kannada language.  In order to produce content in kannada, the following software is available:

have created their own fonts, but they're usuallly not interoperable and not standards compliant. The bigger issue is that they're very display oriented and hence hard to do common operations like sort and search.

To explain the issue: a display oriented kannada script, stores "kannada" as 3 bytes for the three letters. However, if you search for "na" using the representation of "na", the search will return nothing. Things are further complicated by "arkavattu" etc.

For the purpose of machine processing, it makes sense to store "kannada" as ka, na, an (to make na short), na, da.

sahyadri

This makes machine processing easy, but reading difficult, as the machine shows 5 letters instead of 3. This is exactly what standards like ISCII do.

ISCII is a 8 byte code, which can represent 2^8 = 256 characters. The lower 128 are basically US-ASCII. The Indian languages occupy the higher 128 characters. ISCII is a super set of all Indic scripts, which (almost) covers all the vagaries of Indic scripts (but it's hard to satisfy everyone, so you'll see occassional complaints on this matter).

The problem with ISCII is that it's use is not widespread and no good fonts exist. It requires special purpose hardware and software from a small number of vendors. Further, support for ISCII in free software like Linux and BSD using the X windowing system is weak, making cheap kannada computing hard.

However, after ISCII-91 was standardized in 1991, a new standard called Unicode came into picture. Unicode is a multibyte standard, which means, each character could be represented by multiple (not necessarily two) bytes. The advantage of unicode is that many many scripts of the world can be represented in a single page of text, because each script gets it's own range of numbers in the unicode code space. For eg, Devanagari is 0x900-0x97F, Kannada is 0xc80-0xcff etc. In order to read unicode, you need a font, that has all glyphs in unicode. However, no such universal fonts exist today. Microsoft's MS Arial Unicode is popularly used and distributed with office 2000. It's also available for free download (13 MB). Since Unicode is based on ISCII, it's easy to convert between the two.

However, both Unicode and ISCII suffer from the same limitation i.e. the word "kannada" is 5 letters and when literally translated, looks pretty ugly. However, Microsoft and Adobe have come up with a new font technology called OpenType font. This technology has the necessary smarts to do the right things. For eg, the word "kannada" is represented as follows . The file represents the following 5 unicode characters:

0xc95 0xca8 0xccd 0xca8 0xca1

A hex dump of the UTF-8 encoded file shows:

b2e0 e095 a8b2 b3e0 e08d a8b2 b2e0 0aa1

But when viewed with the Tunga open type font, the display looks like below:

tunga

Contrast this with the rendering with a simple true type font. Much nicer isn't it ? But as you can see, the size of the file is significantly larger than ISCII or Baraha. That's the cost of viewing Kannada and hindi simultaneously. However, if you compress the file it should come down to similar levels as ASCII.

Creating a font using this technology is significantly more complex, because the font author has to specify all the special cases - for eg: "ka, virama, sha" is one glyph (ksha for kshatriya) and not three.

The good news is that, Microsoft has shipped the technology with Win2k (no fonts) and a font and keyboard mapping with Windows XP. Here is a screenshot of the kaguNita page, in Win2k with MS Arial Unicode font, without OpenType and Windows XP with OpenType (thanks to Balaji Murthy for this shot).

Free Software support

For the Free software fans, things are progressing slowly, but surely. IndiX project has ported OpenType technology to X windowing system. However, no plans to integrate this into XFree86 seem to exist yet. The pango effort is trying to deal with the problem using special purpose libraries (libIndic). But to me, OpenType sounds like a more elegant solution - because no special purpose code is necessary in the application. One primary obstacle is that the network transparent X protocol doesn't have any provision for dealing with font technology like OpenType.

Yudit is a great piece of software, which can handle OpenType fonts, even if X windowing system itself doesn't support it. You can use it today to create devanagari unicode text. It's simple to convert devanagari to Kannada, since they're very similar.

Keyboard Input

There are two ways:
  1. Transliteration (typing in English and having special purpose software convert it)
  2. Devanagari /Kannada Inscript 
While the former has a shorter learning curve for people comfortable with English keyboards, not all software in the world is going to be rewritten for kannada. For such unmodified software, the second method holds more promise. I'm beginning to get more comfortable with inscript and am doing most of my kannada typing in inscript.

Dr Pavanja is giving a talk at Linux Bangalore 2001, according to the published schedule. There may be more information available on this topic there.

Common Browser tricks

Once you have the right font installed, the only thing you need to do is t choose the right encoding. In the pages hosted on this server, the correct encoding is specified in the headers, so you don't need to do anything. However, for other Kannada unicode pages, you may have to:
View Menu -> Encoding -> Unicode (UTF-8)
View Menu -> Character Encoding -> Unicode (UTF-8)


The above instructions should work on both Windows and UNIX.

Other Resources



Arun Sharma <arun@sharma-home.net.nospam>
Last Modified  12/08/2001