thai-language.comInternet resource
for the Thai language
Lookup:
» more options here
Browse

F.A.Q. Check out the list of frequently asked questions for a quick answer to your inquiry

e-mail the author
guestbook
site settings
site news
bulk lookup
Bangkok
Thanks for your

recent donations!

Narisa N. $+++!
John A. $+++!
Paul S. $100!
Mike A. $100!
Eric B. $100!
John Karl L. $100!
Don S. $100!
John S. $100!
Peter B. $100!
Ingo B $50
Peter d C $50
Hans G $50
Alan M. $50
Rod S. $50
Wolfgang W. $50
Bill O. $70
Ravinder S. $20
Chris S. $15
Jose D-C $20
Steven P. $20
Daniel W. $75
Rudolf M. $30
David R. $50
Judith W. $50
Roger C. $50
Steve D. $50
Sean F. $50
Paul G. B. $50
xsinventory $20
Nigel A. $15
Michael B. $20
Otto S. $20
Damien G. $12
Simon G. $5
Lindsay D. $25
David S. $25
Laurent L. $40
Peter van G. $10
Graham S. $10
Peter N. $30
James A. $10
Dmitry I. $10
Edward R. $50
Roderick S. $30
Mason S. $5
Henning E. $20
John F. $20
Daniel F. $10
Armand H. $20
Daniel S. $20
James McD. $20
Shane McC. $10
Roberto P. $50
Derrell P. $20
Trevor O. $30
Patrick H. $25
Rick @SS $15
Gene H. $10
Aye A. M. $33
S. Cummings $25
Will F. $20
Get e-mail

Sign-up to join our mail­ing list. You'll receive e­mail notification when this site is updated. Your privacy is guaran­teed; this list is not sold, shared, or used for any other purpose. Click here for more infor­mation.

To unsubscribe, click here.

thai-language.com Site News



7 Most Recent Site News Articles
January 2004
Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31

January 21, 2004 3:47 AM by Glenn Slayden   sing-song

I'm now re-entering a lengthy blog entry which I lost earlier today by pressing the wrong button :-(

When I was in Bangkok last week, Khun Asda mentioned that he had met with longtime site visitor and collaborator Charles, who is visiting Thailand for a few weeks. Charles had also met Bryan in Minneapolis last fall, but as for me, I have yet to meet him in person—we missed each other in Bangkok by a day or so. Anyway, Asda mentioned to me in passing that Charles had told him that he uses a system of "singing" to help study and remember Thai tones.

With this one word, an idea immediately struck me to add musical notation to Thai-language.com. Since I received Charles' story second-hand, I don't know if this is his exact conception, but I wish to credit Charles anyhow, for this meme.

The notion was to represent the correct tones for a word as a graphical or spatial representation of music, which might be easier for some people to study and remember. We've all had the experience of being unable to get a tune out of our head, or perhaps using a melody as some other kind of mnemonic. Being a professional musician myself, I thought that standard musical notation might be the most useful representation towards this end. Even people who don't read music might be able to benefit from seeing the two-dimensional contours of a musical staff.

So, in the few days since I returned, I whipped up the feature. You can now see an approximate musical notation for each of our 25000+ Thai entries near the top of its main entry web page. If you don't wish to use this feature, you can disable it in your site control panel.

The musical notation GIF images are generated on-the-fly, based on the transliteration result strings (which, in turn, are mostly generated on-the-fly also). The notation does not use a time signature or meter, but Thai long vowels are differentiated from short vowels by using an open note head or filled-in note head, respectively.

The first step in this project involved analyzing many of our site's audio clips to determine the musical intervals which would represent each of the five spoken tones in the Thai language. Of the several native Thai speakers who have recorded clips, I found that Professor Mak's recordings maintained the most consistent pitch, even across different recording sessions months apart. I found that his low tone was, on average, C#.

Next I carefully analyzed a few hundred audio clips for the pitch changes corresponding to the tones, and I selected the average results, which were as follows:

  Tone    Starting Note    Movement  
LowRoot[none]
MidRoot + m3[none]
HighRoot + m3up m3
RisingRootup p5
FallingRoot + p4down m3

p=perfect; m=minor

Let me again emphasize that these were approximate and averaged values I obtained by analyzing a few hundred audio clips. The next step was to select some notational conventions. By assigning the low tone value to be the root of a diatonic minor key, only one accidental would be needed (for the upper note of the high tone). Thus in the final display, a proper key signature is used, and the appropriate accidental is applied to the high tone. The default key and root is C# minor, but this can be changed to any of the 12 minor keys by using your site control panel.

The low and mid tones use a single note, since their pitch doesn't change much during their duration. The other tones require a sliding movement (melisma), for which I used a slur mark to connect the estimated starting and ending notes.

The technical implementation of this feature turned out to be more of a challenge than I anticipated.

As mentioned above, I draw the images from scratch on the fly, as required by a particular word or sentence. This method was selected (as opposed to pre-rendering all of the images) for several reasons, notably: 1.) changes in the transliteration input will be automatically reflected in the musical notation; 2.) no disk-hit required to service the client; 3.) no proliferation of 25000 image files on the server.

However, since we use a Windows-based web server, the graphics subsystem (GDI) produces a raster image format, the so-called DIB, which is not acceptable for direct use on the web. This is for three reasons: 1.) it's a proprietary format which is not usually supported on non-Windows machines 2.) I think it's not authorized by the Web standards documents 3.) it's a non-compressed format, so transmission would be inefficient. After rendering the image to an off-screen memory drawing surface, we must therefore convert the drawing to a GIF. Of course this must also be done on-the-fly.

Encoding an uncompressed image to GIF is a fairly involved procedure. Fortunately, I had some old code hanging around. Interestingly enough, I originally used this code in 1997 for the first version of Thai-language.com. For the record, that historical code sample is credited to Spencer Thomas, Jim McKie, Steve Davies, Ken Turkowski, James Woods, Joe Oorst, Donald Knuth, G. Knott, David Rowley, Marcel Wijkstra, Jef Poskanzer, and now me. In those days, before the web browser had good Thai font support, I was toying with the idea of using an on-the-fly drawing/encoding-to-GIF process similar to that described above to draw all the Thai text on the site. At that time, Thai-language.com ran on a Unix server, and I did manage to get on-the-fly rendering of Thai text working as a CGI application compiled with Gnu (GCC). This was non-trivial considering that the Unix box had no Thai character support, so the code had to assemble a Thai word or sentence by copying each character, one at a time, from a special source bitmap I prepared which had one copy of every Thai character. This absurd CGI actually worked, but I never ended up deploying it into the actual site, probably because the Unix hardware seemed to handle the whole affair quite slowly.

Back to yesterday's project: the tricky part that I didn't anticipate is that that ancient GIF encoding code doesn't accept 24-bit (true color) input, but rather expects a palettized image—and copies that same palette for the GIF. I knew that it would be alot easier to do my drawing to a true color surface, even if all my drawing was grayscale, since I wouldn't have to worry about palette management. And I also knew that I had another historical code sample which performs a Heckbert analysis (median-cut) to find an optimal palette for a true color image. This could have been used to add 24-bit input capability to the GIF encoding routine. That work would have been a generally handy improvement too.

In the end, however, I decided to try to stick to an 8-bit indexed (palettized) DIB as my off-screen drawing surface. I wouldn't be modifying the GIF encoder. So I had to screw around with the palette crap.

Hmm, from writing this blog entry, I just realized that the little-known optional Windows subsystem called GDIPlus can encode a GIF. I wonder if this would have been a simpler way to go? Geez.

In an attempt to eke text anti-aliasing out of Windows 2000, I was already using GDIPlus to (optionally) render the text in the musical notation image. ClearType antialiasing would have happened automatically if the web server was an XP or Windows Server 2003 machine, but alas, it's not, and it can't be easily updated. I found that the GDIPlus font antialiasing is quite bad-looking, so for now the small text is not anti-aliased.

For a nanosecond, I also considered deploying the rendering code mentioned above on a different XP-running-machine on my intranet and accessing it from the web server via DCOM. This overly complex option was easily rejected.

A final technical point to mention is that since the resulting GIF file has to be delivered to the web browser with the "image/gif" MIME type as a separate page item (i.e. separate connection stream), the entire image rendering and encoding process is packaged as an ISAPI extension DLL, completely separate from of the ActiveX framework which runs the rest of the site content. The DLL takes its inputs, notably a special form of the human-readable Thai transliteration, and other options, as QueryString parameters in the URL, and quickly draws, encodes, and transmits the GIF.

Please feel free to let me know if you have any questions or comments about the new feature. Thanks,

Glenn


January 6, 2004 2:52 PM by Glenn Slayden   DBEdit handoff — Time off

I am pleased to announce the end of a 4-month development cycle which has seen a complete overhaul of the DBEdit database maintenance tool. This cycle has been particularly grueling since so many things were changing and at times I felt like I'd never be able to put Humpty-Dumpty back together again. My boyfriend will be especially pleased that it's over; he has seen little of my attention during my very long work hours during this period.

Most of the work has been chronicled in these pages over the past weeks. A last minute flurry of activity included:

  • Replacing the core control skelton of the application, previously a classic windows message handler, with a portable object-oriented framework that I had developed and have been using in most of my recent projects;
  • Rewrite of the selection dialog. Migration to a common source for most of the disparate selection dialogs;
  • Fixing a release-version-only bug related to my use of C++ exception handling (n.b. apparently if you don't call set_terminate in your try block, your release build will fail with "this application has requested the Runtime to terminate in an unusual way..." when you throw a C++ exception; this is using Visual Studio .NET. The terminate function can have no body if you're catching all your exceptions... I think);
  • More bug fixes to the balky, troublesome "on-the-fly maintenance of the all-index" code;
  • Keyboard navigation in my grid component;
  • Bug fixes in the new nesting, multi-prefix, and multi-suffix features;
  • Audio recording support for prefixed/suffixed defs;
  • Editing history;
  • Pull up site web page by clicking in DBEdit;
  • Playback site WMA files;
  • etc.

With so many changes, I came to realize that 100% closure was an impossibility. So I set an arbitrary date (yesterday) to release the build to Bryan and managed to stick to it. In order to test the new features, I entered many new Thai words, which you've seen uploaded to the site recently.

Now I'm off to Thailand for a week of relaxation. Thanks everyone for your continued support of thai-language.com.

Glenn


January 2, 2004 8:46 PM by Glenn Slayden   Happy New Year

Uploaded new entries, now 25857.


December 30, 2003 2:21 PM by Glenn Slayden   Posted Spacing Article, DBEdit Summary

Posted Bryan's article on spacing (from the message board) to the reference section.

Entered corrections and made many changes in the site editing tool (DBEdit). I am now optimistic that this work is converging so that the tool can once again be stable enough to be used by editors other than myself. The list of features that have been incorporated in this pass, which has been running overschedule by a couple months, is too lengthy to completely detail, but some highlights are:

  • tab-based or "property sheet" interface
  • WMA playback from within the tool
  • all new prefix/suffix management
  • all new alternate spelling/pronunciation management
  • all new synonym & antonym support
  • centralized management of a single dynamic index which is always kept sorted (using binsearch reinsertion) speeds all operations
  • automatic TIS-620 converting
  • rewrite of "multi-select" dialog
  • smart column resizing
  • tooltips
  • context-sensitive phrase item management; smart phrase-to-def conversion with prefixes
  • much more...

For a while there, I was worried that it had crossed a forbidden complexity boundary beyond which stability could never be recovered. But it is finally starting to settle down and perform very well, performing complex splitting and management operations without any problems. Yay! Results are starting to trickle into the site, as posted in other news entries.


December 27, 2003 9:36 PM by Glenn Slayden   Phrase Transliteration

The first provision has been made for correcting the transliteration of phrases. Previously, phrase transliterations were irrevocably built from the simple transliteration of its component parts. Silly me.

Originally, a phrase was optionally able to draw upon a single "alternate spelling" for each of its components. This had recently been extended so that a phrase could specify which one of multiple alternate spellings was drawn upon. And now, transliteration is also drawn from specified alternate spelling, effectively extending this feature to "alternate spellings or pronunciations."

Commemorative sample entry: ผลไม้


December 27, 2003 5:52 PM by Glenn Slayden   Tracking changes

Some people have asked me how they can tell when new content has been loaded into the site. This is easy; there has been a content revision number available in site info for some time; you can keep track of this number to know if changes have been made. There's no way to know exactly what the changes are. Major changes are documented in this blog.

Testing of the new suffix feature has been underway for a few days; one place you can see the test entries is in the entry for ใจ.


December 25, 2003 11:21 PM by Glenn Slayden   Partial Schema Diagram



Copyright © 2024 thai-language.com. Portions copyright © by original authors, rights reserved, used by permission; Portions 17 USC §107.