Status of Kannada Computing on the Internet
This is an attempt to survey the state of the art technology in computing
using the Kannada language.
In order to produce content in kannada, the following software
is available:
- Baraha - excellent, easy
to use. Microsoft Windows only, though fonts work on other platforms. Based
on transliteration
.
- C-DAC
sells a technology called GIST. They sell both software and hardware
in many Indian languages including Kannada.
- A number of Kannada newspapers and magazines
have created their own fonts, but they're usuallly not interoperable
and not standards compliant. The bigger issue is that they're very display
oriented and hence hard to do common operations like sort and search.
To explain the issue: a display oriented kannada script, stores "kannada"
as 3 bytes for the three letters. However, if you search for "na" using
the representation of "na", the search will return nothing. Things are
further complicated by "arkavattu" etc.
For the purpose of machine processing, it makes sense to store "kannada"
as ka, na, an (to make na short), na, da.
This makes machine processing easy, but reading difficult, as the machine
shows 5 letters instead of 3. This is exactly what standards like
ISCII do.
ISCII is a 8 byte code, which can represent 2^8 = 256 characters. The
lower 128 are basically US-ASCII. The Indian languages occupy the higher
128 characters. ISCII is a super set of all Indic scripts, which (almost)
covers all the vagaries of Indic scripts (but it's hard to satisfy everyone,
so you'll see occassional complaints on this matter).
The problem with ISCII is that it's use is not widespread and no good
fonts exist. It requires special purpose hardware and software from a small
number of vendors. Further, support for ISCII in free software like
Linux and BSD using the
X windowing system is weak, making cheap kannada computing hard.
However, after ISCII-91 was standardized in 1991, a new standard called
Unicode came into picture. Unicode
is a multibyte standard, which means, each character could be represented
by multiple (not necessarily two) bytes. The advantage of unicode is that
many many scripts of the world can be represented in a single page of text,
because each script gets it's own range of numbers in the unicode code space.
For eg, Devanagari is 0x900-0x97F, Kannada is 0xc80-0xcff etc. In order
to read unicode, you need a font, that has all glyphs in unicode. However,
no such universal fonts exist today. Microsoft's MS Arial Unicode is popularly
used and distributed with office 2000. It's also available for
free download (13 MB). Since Unicode is based on ISCII, it's easy
to convert between the two.
However, both Unicode and ISCII suffer from the same limitation i.e.
the word "kannada" is 5 letters and when literally translated, looks pretty
ugly. However, Microsoft and Adobe have come up with a new font technology
called OpenType
font. This technology has the necessary smarts to do the right things.
For eg, the word "kannada" is represented as follows
. The file represents the following 5 unicode characters:
0xc95 0xca8 0xccd 0xca8 0xca1
A hex dump of the UTF-8 encoded file shows:
b2e0 e095 a8b2 b3e0 e08d a8b2 b2e0 0aa1
But when viewed with the Tunga open type font, the display looks like below:
Contrast this with the rendering with a simple true type font. Much nicer
isn't it ? But as you can see, the size of the file is significantly larger
than ISCII or Baraha. That's the cost of viewing Kannada and hindi simultaneously.
However, if you compress the file it should come down to similar levels as
ASCII.
Creating a font using this technology is significantly more complex, because
the font author has to specify all the special cases - for eg: "ka, virama,
sha" is one glyph (ksha for kshatriya) and not three.
The good news is that, Microsoft has shipped the technology with Win2k
(no fonts) and a font and keyboard mapping with Windows XP. Here is a
screenshot of the kaguNita page, in
Win2k with MS Arial Unicode font, without OpenType and
Windows XP with OpenType (thanks to Balaji Murthy for this shot).
For the Free software fans, things are progressing slowly, but surely.
IndiX project has ported OpenType technology to X windowing system.
However, no plans to integrate this into XFree86 seem to exist yet. The
pango effort is trying to deal with
the problem using special purpose libraries (libIndic). But to me, OpenType
sounds like a more elegant solution - because no special purpose code
is necessary in the application. One primary obstacle is that the network
transparent X protocol doesn't have any provision for dealing with font
technology like OpenType.
Yudit is a great piece of software,
which can handle OpenType fonts, even if X windowing system itself doesn't
support it. You can use it today to create devanagari unicode text. It's
simple to
convert devanagari to Kannada, since they're very similar.
Keyboard Input
There are two ways:
- Transliteration (typing in English and having special purpose
software convert it)
- Devanagari
/Kannada
Inscript
While the former has a shorter learning curve for people comfortable
with English keyboards, not all software in the world is going to be rewritten
for kannada. For such unmodified software, the second method holds more
promise. I'm beginning to get more comfortable with inscript and am doing
most of my kannada typing in inscript.
Dr Pavanja is giving a talk at Linux Bangalore 2001, according to the
published schedule. There may be more information available on this topic
there.
Common Browser tricks
Once you have the right font installed, the only thing you need to do
is t choose the right encoding. In the pages hosted on this server, the correct
encoding is specified in the headers, so you don't need to do anything.
However, for other Kannada unicode pages, you may have to:
View Menu -> Encoding -> Unicode (UTF-8)
View Menu -> Character Encoding -> Unicode (UTF-8)
The above instructions should work on both Windows and UNIX.
Other Resources
Arun Sharma <arun@sharma-home.net.nospam>
Last Modified 12/08/2001