Internet resource for the Thai language |
F.A.Q. Check out the list of frequently asked questions for a quick answer to your inquiry
recent donations!
Sign-up to join our mailing list. You'll receive email notification when this site is updated. Your privacy is guaranteed; this list is not sold, shared, or used for any other purpose. Click here for more information.
To unsubscribe, click here.
December 2003 News Archive Posted Bryan's article on spacing (from the message board) to the reference section. Entered corrections and made many changes in the site editing tool (DBEdit). I am now optimistic that this work is converging so that the tool can once again be stable enough to be used by editors other than myself. The list of features that have been incorporated in this pass, which has been running overschedule by a couple months, is too lengthy to completely detail, but some highlights are:
For a while there, I was worried that it had crossed a forbidden complexity boundary beyond which stability could never be recovered. But it is finally starting to settle down and perform very well, performing complex splitting and management operations without any problems. Yay! Results are starting to trickle into the site, as posted in other news entries. The first provision has been made for correcting the transliteration of phrases. Previously, phrase transliterations were irrevocably built from the simple transliteration of its component parts. Silly me. Originally, a phrase was optionally able to draw upon a single "alternate spelling" for each of its components. This had recently been extended so that a phrase could specify which one of multiple alternate spellings was drawn upon. And now, transliteration is also drawn from specified alternate spelling, effectively extending this feature to "alternate spellings or pronunciations." Commemorative sample entry: ผลไม้ Some people have asked me how they can tell when new content has been loaded into the site. This is easy; there has been a content revision number available in site info for some time; you can keep track of this number to know if changes have been made. There's no way to know exactly what the changes are. Major changes are documented in this blog. Testing of the new suffix feature has been underway for a few days; one place you can see the test entries is in the entry for ใจ. I have not been idle these many days; I have been engineering changes to the database model which I now liken to neurosurgery. Perhaps a brief overview of the current priority stack is in order. I wanted to enhance the new translation feature so that it makes use of the numerous prefixes in our database. I knew that I was anticipating adding suffixes also, which might change the prefix accessing methods, so adding suffixes had to come first. As long as I was in the prefix/suffix code, I figured I should also add the ability to enter multiple prefixes and suffixes. This was all coming along fine, including the extensive changes to DBEdit, when (last night) I recognized a design flaw with the new scheme. It's best explained by example. As it currently stands, เสียใจ is entered as a phrase. That page also shows the prefixed phrase ความเสียใจ. After creating the suffix feature, I went to convert เสียใจ to a suffixed word, in which case it would appear on the page for เสีย; but this strands the prefixed phrase ความเสียใจ in no-man's land. True, under the new system, I can convert that also to an item with both a prefix and a suffix, but it would lose the fact of it's relationship with เสียใจ. Both would be seemingly unrelated peers on the page for เสีย. This means... it's time to go watch a movie, come back tomorrow. At times of recognizing error, I always find it best to disengage for a while. I quickly realized I'd have to dig in even deeper and allow for the type of relationship the case studies are demanding. The question was, "how?" And was the original (2001) design distinction between definitions and phrases being called into question? My first idea was to extend the already laden prefix/suffix mechanism with a third data type, a "parent" item. I believe I'll reject this because, as mentioned, that mechanism is already overburdened. I'm now inclined to overload the mechanism which establishes one-to-many links between definitions and their Thai representation, to allow for a link to another definition in those cases which require it. Previously, an item in our database could have a single prefix. Nesting could be accomplished by using that prefixed word as a prefix for another item, but this system did have some drawbacks. For example, somtimes it doesn't make sense to create an entry for a compound prefix itself. I've now modified the database to allow for multiple prefixes and multiple suffixes (suffixes are entirely new). For some reason, this change turned out to be a lot more complicated than anticipated, and it's quite possible that many bugs have been introduced. Additionally, the actual upload just now is interim, and there are currently no entries which use suffixes. More work remains to expose the new features in the COM layer and also to provision the DBEdit tool. Part of the reason this was complicated is that I designed an efficient data structure for these new concepts; space in the database is not allocated if these new features are not used. There were some site crashes at first, but I hope I have them all sorted out now. If not, I apologize in advance. I added two options to the control panel which can be used to limit the size of the download of a dictionary page. Some of the pages for basic words such as เป็น are quite large. They take time for our server to prepare and also to transmit to you. You can now selectively turn off the appearance of example phrases and/or complete sentence examples in your site control panel, if you don't need this information and want a quicker response. I decided that "attributive verb" is the correct term for the part-of-speech that was incorrectly christened "stative verb" last week. Thai does have stative verbs, but they are not the empowered adjectives I was seeking to categorize. Here is the corrected link. Those of you who have been able to discern something about the structure or internal layout of our database know that two fundamental and distinct elements in our system are the so-called "phrase" and "definition." In short, a phrase is built up by combining multiple definitions or (other) phrases in an arbitrary way. Since the sub-component of our "phrase" can be either a definition or a phrase, such an element has come to be known as a DOP, a "definition or phrase." Over the past few weeks and months, I have been consolidating (moving) a lot of the functionality which is common to a phrase component (DOP) from the separate definition and phrase entities, into a common structural encapsulation of this DOP concept. The definition and phrase classes now inherit from this "DopItem" base class, and I have been rolling more and more functionality into it. This new class is also turning out to be a convenient platform for polymorphism. Although it's not using the C++ language mechanism for this (because of the excessive memory the vtables would use), the home-grown polymorphism based on our unique IDs is fast, effective, and convenient. At the very beginning of this web site's design, a decision was made to use mutually-exclusive ID ranges for all database items. This has worked out extremely well for so many reasons—quick-n-easy polymorphism being one of the most important. The DopItem effort has been wildly successful in normalizing and reducing the amount of code in all our tools and components. You may recall from a recent blog entry that the DopItem entity has now also been COM-wrapped, so the benefits of this change reach into the site's ASP code as well. I just completed the delicate operation of consolidating the phrase- and definition-bit flags, which required remapping all existing flags into a new system of DopItem-based flags. This work was prompted by the desire to introduce a "suffix" concept into our database. I'm considering overloading the existing "prefix" memory allocation for this feature. My tentative plan should allow for multiple prefixes and/or suffixes on a DopItem, with no new fixed memory usage. In fact, as more functionality is abstracted in the DopItem class, it should someday be possible to implement a higher-density memory representation—should the database become unwieldy. Added a WMA file playback capability to DBEdit (see article). Previously, DBEdit could only play back WAV files—once the audio clips were converted to WMA (which we must do in order to dramatically reduce the download time and bandwith usage for our users), they couldn't be played from within the tool. This new capability in DBEdit works by playing the WMA files using an Internet URL, so Bryan and any other editor(s) around the world will be able to review our over 10,000 existing audio clips conveniently from within the DBEdit program. Wrote the first release of the reference page on Stative Verbs. [ed. now defunct.] Implemented the IDops collection object. Uploaded new entries, consolidated some duplicate entries which were revealed by new database validation code. DBEdit: fixed numerous bugs; cleaned up and organized code; consolidated Thai-search function usage and deleted unwanted versions; replaced the informational log-window output functions with vararg versions. Changed syllable-initial transliteration of ค from k- to kh-. Defined the "stative verb" part-of-speech flag, the usage of which is under discussion. I changed the way that alternate spellings are represented in the database. Instead of as a single free-form, comma-delimited string attached to a Thai word, they are now each independently represented with a (rather bare-bones) entry of their own. These special entries are linked to the master spelling. This was a very widespread change throughout all levels of the site's code, and I would expect to encounter some minor (or major) problems from this in the coming days. If you notice any, please let me know.
|