Wednesday, July 19, 2006

Sunday, July 16, 2006

Yesterday we tried something new, courtesy a rumour I heard about a unique cocktail they serve at a restaurant here in Kanpur. Some ingredients gathered from the Hall canteen, alongwith some freely acquired booze; the result was a cocktail that was surprisingly delightful! I'm sure the recipe and the description can be highly improved, but here's the exciting first draft, delivered by a man who's still, well, alive and happy.

How to make a Patiala Lassi:

(an abbreviation of its complete name: the Kanpuriya Patiala Scottish Lassi)


60 ml Whiskey.
6 tablespoons Curd (Yoghurt).
4 teaspoons Sugar (or to taste).
30-50 ml Water.


Pour the whiskey and curd in a mixer. Mix on low speed till the curd forms a smooth paste with the whiskey. Add sugar, and water to thin the drink down to about half the consistency. A high amount of sugar is necessary to counter the bitterness of curd and whiskey. Mix well and serve with 2 ice cubes. Enjoy the hangover the next day.

Friday, July 07, 2006

The geeky joke for the day:

"My point, your honour, is a degenerate line."

Saturday, June 10, 2006

This post is a supplement to an earlier post on this blog on Unicode and multilingual support in operating systems. It highlights one way to take advantage of Unicode support in most modern OSes.

Most popular filesystems today are quite flexible when it comes to the rules they place on filenames (remember
MS-DOS and it's 8.3 file format?). Almost all filesystems allow for upto 255 characters, and while FAT32 and NTFS allow any valid Unicode character to make up its filenames, ext2 and ext3 can allow any byte except NULL.

Why am I going on about it? Well, quite obviously, this means that my files can be named in any language I choose! Quite obvious, indeed... but loads and bucketloads of fun! I've already transliterated many of my मराठी mp3s to their देवनागिरी equivalents. Instead of a very non-Indian 'gaara vaara ha bharaaraa.mp3, I now have a much more satisfying 'गार वारा हा भरारा.mp3'. Pretty neat huh?

In fact, I'm so thrilled about the whole thing that I even have a snapshot ready to show you (the same way most fathers have snaps of their kids in their wallets). This is a snapshot of a few files displayed in Windows Explorer :

The cherry on the cake is that most GUI consoles on Linux already support Unicode, so you can actually work with Unicode filenames through the omnipotent console!

There are definite plus points in taking advantage of Unicode support in this way, but for the benefit of a few of my readers, I'd like to mention that it doesn't come without its set of disadvantages. Don't get excited and attempt to rename all your files in one go if you don't have a keyboard to facilitate easy typing in your language. Not only will renaming a few thousand files initially take eons, but functions like file search will now require you to type keywords in your language! Besides, Unicode compatibility hasn't spread everywhere, and alas, you'll often come across software that garbles the filenames. But apart from these setbacks, this has got to be the coolest thing you can do today!

Tuesday, June 06, 2006

This is possibly the worst technical joke I have come up with in a long long time. The reason I'm even putting it in here is because a friend of mine actually solved the contorted geeky logic behind the joke almost instantaneously. It kind of reflects the sort of geekdom that rules around here. Here goes:

Q. How many bits does 3:45 in हिंदी require in ANSI C?
A. 6 bits.

Go figure...

Saturday, May 27, 2006

A few years back, someone extremely special to me had a very nasty experience with a man who repeatedly tried to grope her while she was sleeping in the upper berth of a train. I listened to her with shock and anger. A few months back, I spent some time reading about experiences of Indian women and girls in posts like this, getting very digusted with the attitude of Indian men. Many close friends have posted articles (for example, click here and here) condemning this sort of behaviour. I remember having discussed the issue to fair length with them.

And then it happened, right before my eyes. It was so fast that before I could do anything, the bastard sped away. I tried to note his licence plate, but it was too late. I wanted to scream at him, but the road was more or less empty but for the victim and me, and he was far away, so no good could come from it. I want to kill the bastard, I really do. I am ashamed and confused, a million thoughts swarming in my head. I feel impotent, being unable to do anything about the whole thing, either through law or with my own hands, while he might still be enjoying the thrills of his perverted acts, not once thinking of the scars he leaves behind. I won't go into a gyaan session this time. No way, not this time. He's gotten personal with me now.

Rest assured I will kill him if I meet him again.

Friday, April 28, 2006

As part of my work on Natural Language Processing, I was required to learn how to work comfortably with Indic languages. This meant being able to computationally process Indic scripts, either in standard Unicode or in proprietary encoding, and parse Indian sentences syntactically and semantically. I found the work terribly refreshing, and I've found that the language features that most OSes provide have been underutilized for way too long. This post is part of what I've discovered.

If you want to be able to seamlessly work with your own languages like हिंदी, বাংলা, ગુજરાતી, ਪੰਜਾਬੀ, ಕನ್ನಡ, తెలుగు or தமிழ் on your computer the way you naturally do in your life, or are more fluent in your own language and have always wondered why you were stuck with working exlusively with English, you need to read this.

In 1991, the Unicode standard attempted to standardize and bring some regularity to the chaos of innumerable independent language scripts that were popping up all over the world. These scripts offered some compatibility with the roman script, but rarely worked with one another. Unicode supports almost all scripts in use today, from Arabic (العربية) to Zhuyin (中文). Every script has its own place in Unicode space, so that means that you can seamlessly integrate several scripts into one document, like I've just done.

Getting support for Indic and Arabic scripts in Windows XP is rather straightforward, and I'll explain it in brief here. In fact, unfortunately the rest of this post will deal with Windows XP exclusively. Linux and *nix users are requested to click here instead - getting Indic scripts to work in Linux is perhaps a bit more involved. For the rest, the "Regional and Language Options" icon under the Windows XP Control Panel is where you would want to go. Once it opens up, click on the "Languages" tab, and under the "Supplemental language support" group, tick the checkbox that says "Install files for complex scripts and right-to-left languages (including Thai)". Click "OK", wait for the installation to complete, and you're done with the preliminary support!

Well, this probably deserves some explanation. In Unicode, a complete phoneme like हिं is made up of a sequence of its compositional units, like ह+ि+ं (not really suprising at all, eh!). However, in roman script a sequence remains a sequence orthographically (c+a+t=cat), whereas in many languages like our own, a sequence could be mapped to a completely different character (think the previous example or, say, त्+र=त्र). So, Unicode fonts need to accommodate for this, and characters like त्र are stored in the font as well (even if they are still stored internally as a sequence of the Unicode representations of their compositional units).

Secondly, to start typing, you need to install the languages you would like to work with in Windows XP. To do this, go back to now familiar "Regional and Language Options" and under the "Languages" tab, click on the "Details..." button in the "Text services and input languages" group. Under "Installed services", click on "Add..". Add any language you wish, alongwith associated services like the corresponding keyboard. In case your language does not have a keyboard supported, choose "INSCRIPT". For हिंदी, Windows XP provides a "Hindi Traditional" keyboard. Now, under the "Preferences" group of the "Text services and input languages" window, click on "Language Bar...". Click on "Show the Language bar on the Desktop", and click on "Apply". You should now see a new bar floating around, and you can click on the icon that says "EN" to choose between languages installed. You are now ready to start typing in your own language!

In Windows XP, you simply need to choose a language in the Language bar to start typing. However, learning which characters are mapped to which keys on the keyboard isn't easy. There are in fact many ways to type in non-English languages. They are:

  1. Use the keyboard to type. Get a देवनागिरी keyboard or a keyboard for your language, or simply buy a keyboard skin. Learn the mapping of keys to characters youself. In general, for हिंदी, the consonants are towards the right, and the matras are towards the left.
  2. Install software or use online editors to type. This is much slower than actually using the keyboard since you have to click on each character. A fairly simple editor for हिंदी can be found here.
  3. Use transliteration. UPenn has a very handy webpage that lets you type romanized Hindi and get equivalent transliterated Unicode. So, you can type "bhagawaana", and the webpage gives you भगवान! Find it here.

I hope this really basic post will get you to do interesting things with Unicode. Being the samaritan that I am, I volunteer to give you pointers here as well. Some really fun things you can start off with in your own languages are:

  1. Start searching the web with keywords in your own scripts. This will introduce you to a part of the web you haven't seen before. A Google search for 'गरम मसाला' can be found here.
  2. Explore web pages and blogs that have long since adopted the Unicode standard. An example blog can be found here (I have no idea who the author is, it's just something I stumbled upon).
  3. Start contributing to the community. For starters, start blogging in your favourite language, or start adding pages, for example, to the বাংলা, ગુજરાતી, ಕನ್ನಡ or తెలుగు Wikipedias! It's about time we started making our presence felt on the World Wide Web, and asserted our identity!