for the Thai language
F.A.Q. Check out the list of frequently asked questions for a quick answer to your inquiry
January 21, 2004 3:47 AM by Glenn Slayden sing-song
I'm now re-entering a lengthy blog entry which I lost earlier today by pressing the wrong button :-(
When I was in Bangkok last week, Khun Asda mentioned that he had met with longtime site visitor and collaborator Charles, who is visiting Thailand for a few weeks. Charles had also met Bryan in Minneapolis last fall, but as for me, I have yet to meet him in person—we missed each other in Bangkok by a day or so. Anyway, Asda mentioned to me in passing that Charles had told him that he uses a system of "singing" to help study and remember Thai tones.
With this one word, an idea immediately struck me to add musical notation to Thai-language.com. Since I received Charles' story second-hand, I don't know if this is his exact conception, but I wish to credit Charles anyhow, for this meme.
The notion was to represent the correct tones for a word as a graphical or spatial representation of music, which might be easier for some people to study and remember. We've all had the experience of being unable to get a tune out of our head, or perhaps using a melody as some other kind of mnemonic. Being a professional musician myself, I thought that standard musical notation might be the most useful representation towards this end. Even people who don't read music might be able to benefit from seeing the two-dimensional contours of a musical staff.
So, in the few days since I returned, I whipped up the feature. You can now see an approximate musical notation for each of our 25000+ Thai entries near the top of its main entry web page. If you don't wish to use this feature, you can disable it in your site control panel.
The musical notation GIF images are generated on-the-fly, based on the transliteration result strings (which, in turn, are mostly generated on-the-fly also). The notation does not use a time signature or meter, but Thai long vowels are differentiated from short vowels by using an open note head or filled-in note head, respectively.
The first step in this project involved analyzing many of our site's audio clips to determine the musical intervals which would represent each of the five spoken tones in the Thai language. Of the several native Thai speakers who have recorded clips, I found that Professor Mak's recordings maintained the most consistent pitch, even across different recording sessions months apart. I found that his low tone was, on average, C#.
Next I carefully analyzed a few hundred audio clips for the pitch changes corresponding to the tones, and I selected the average results, which were as follows:
Let me again emphasize that these were approximate and averaged values I obtained by analyzing a few hundred audio clips. The next step was to select some notational conventions. By assigning the low tone value to be the root of a diatonic minor key, only one accidental would be needed (for the upper note of the high tone). Thus in the final display, a proper key signature is used, and the appropriate accidental is applied to the high tone. The default key and root is C# minor, but this can be changed to any of the 12 minor keys by using your site control panel.
The low and mid tones use a single note, since their pitch doesn't change much during their duration. The other tones require a sliding movement (melisma), for which I used a slur mark to connect the estimated starting and ending notes.
The technical implementation of this feature turned out to be more of a challenge than I anticipated.
As mentioned above, I draw the images from scratch on the fly, as required by a particular word or sentence. This method was selected (as opposed to pre-rendering all of the images) for several reasons, notably: 1.) changes in the transliteration input will be automatically reflected in the musical notation; 2.) no disk-hit required to service the client; 3.) no proliferation of 25000 image files on the server.
However, since we use a Windows-based web server, the graphics subsystem (GDI) produces a raster image format, the so-called DIB, which is not acceptable for direct use on the web. This is for three reasons: 1.) it's a proprietary format which is not usually supported on non-Windows machines 2.) I think it's not authorized by the Web standards documents 3.) it's a non-compressed format, so transmission would be inefficient. After rendering the image to an off-screen memory drawing surface, we must therefore convert the drawing to a GIF. Of course this must also be done on-the-fly.
Encoding an uncompressed image to GIF is a fairly involved procedure. Fortunately, I had some old code hanging around. Interestingly enough, I originally used this code in 1997 for the first version of Thai-language.com. For the record, that historical code sample is credited to Spencer Thomas, Jim McKie, Steve Davies, Ken Turkowski, James Woods, Joe Oorst, Donald Knuth, G. Knott, David Rowley, Marcel Wijkstra, Jef Poskanzer, and now me. In those days, before the web browser had good Thai font support, I was toying with the idea of using an on-the-fly drawing/encoding-to-GIF process similar to that described above to draw all the Thai text on the site. At that time, Thai-language.com ran on a Unix server, and I did manage to get on-the-fly rendering of Thai text working as a CGI application compiled with Gnu (GCC). This was non-trivial considering that the Unix box had no Thai character support, so the code had to assemble a Thai word or sentence by copying each character, one at a time, from a special source bitmap I prepared which had one copy of every Thai character. This absurd CGI actually worked, but I never ended up deploying it into the actual site, probably because the Unix hardware seemed to handle the whole affair quite slowly.
Back to yesterday's project: the tricky part that I didn't anticipate is that that ancient GIF encoding code doesn't accept 24-bit (true color) input, but rather expects a palettized image—and copies that same palette for the GIF. I knew that it would be alot easier to do my drawing to a true color surface, even if all my drawing was grayscale, since I wouldn't have to worry about palette management. And I also knew that I had another historical code sample which performs a Heckbert analysis (median-cut) to find an optimal palette for a true color image. This could have been used to add 24-bit input capability to the GIF encoding routine. That work would have been a generally handy improvement too.
In the end, however, I decided to try to stick to an 8-bit indexed (palettized) DIB as my off-screen drawing surface. I wouldn't be modifying the GIF encoder. So I had to screw around with the palette crap.
Hmm, from writing this blog entry, I just realized that the little-known optional Windows subsystem called GDIPlus can encode a GIF. I wonder if this would have been a simpler way to go? Geez.
In an attempt to eke text anti-aliasing out of Windows 2000, I was already using GDIPlus to (optionally) render the text in the musical notation image. ClearType antialiasing would have happened automatically if the web server was an XP or Windows Server 2003 machine, but alas, it's not, and it can't be easily updated. I found that the GDIPlus font antialiasing is quite bad-looking, so for now the small text is not anti-aliased.
For a nanosecond, I also considered deploying the rendering code mentioned above on a different XP-running-machine on my intranet and accessing it from the web server via DCOM. This overly complex option was easily rejected.
A final technical point to mention is that since the resulting GIF file has to be delivered to the web browser with the "image/gif" MIME type as a separate page item (i.e. separate connection stream), the entire image rendering and encoding process is packaged as an ISAPI extension DLL, completely separate from of the ActiveX framework which runs the rest of the site content. The DLL takes its inputs, notably a special form of the human-readable Thai transliteration, and other options, as QueryString parameters in the URL, and quickly draws, encodes, and transmits the GIF.
Please feel free to let me know if you have any questions or comments about the new feature. Thanks,
January 6, 2004 2:52 PM by Glenn Slayden DBEdit handoff — Time off
I am pleased to announce the end of a 4-month development cycle which has seen a complete overhaul of the DBEdit database maintenance tool. This cycle has been particularly grueling since so many things were changing and at times I felt like I'd never be able to put Humpty-Dumpty back together again. My boyfriend will be especially pleased that it's over; he has seen little of my attention during my very long work hours during this period.
Most of the work has been chronicled in these pages over the past weeks. A last minute flurry of activity included:
With so many changes, I came to realize that 100% closure was an impossibility. So I set an arbitrary date (yesterday) to release the build to Bryan and managed to stick to it. In order to test the new features, I entered many new Thai words, which you've seen uploaded to the site recently.
Now I'm off to Thailand for a week of relaxation. Thanks everyone for your continued support of thai-language.com.
January 2, 2004 8:46 PM by Glenn Slayden Happy New Year
Uploaded new entries, now 25857.
December 30, 2003 2:21 PM by Glenn Slayden Posted Spacing Article, DBEdit Summary
Posted Bryan's article on spacing (from the message board) to the reference section.
Entered corrections and made many changes in the site editing tool (DBEdit). I am now optimistic that this work is converging so that the tool can once again be stable enough to be used by editors other than myself. The list of features that have been incorporated in this pass, which has been running overschedule by a couple months, is too lengthy to completely detail, but some highlights are:
For a while there, I was worried that it had crossed a forbidden complexity boundary beyond which stability could never be recovered. But it is finally starting to settle down and perform very well, performing complex splitting and management operations without any problems. Yay! Results are starting to trickle into the site, as posted in other news entries.
December 27, 2003 9:36 PM by Glenn Slayden Phrase Transliteration
The first provision has been made for correcting the transliteration of phrases. Previously, phrase transliterations were irrevocably built from the simple transliteration of its component parts. Silly me.
Originally, a phrase was optionally able to draw upon a single "alternate spelling" for each of its components. This had recently been extended so that a phrase could specify which one of multiple alternate spellings was drawn upon. And now, transliteration is also drawn from specified alternate spelling, effectively extending this feature to "alternate spellings or pronunciations."
Commemorative sample entry: ผลไม้
December 27, 2003 5:52 PM by Glenn Slayden Tracking changes
Some people have asked me how they can tell when new content has been loaded into the site. This is easy; there has been a content revision number available in site info for some time; you can keep track of this number to know if changes have been made. There's no way to know exactly what the changes are. Major changes are documented in this blog.
Testing of the new suffix feature has been underway for a few days; one place you can see the test entries is in the entry for ใจ.
December 25, 2003 11:21 PM by Glenn Slayden Partial Schema Diagram