Internet resource for the Thai language |
F.A.Q. Check out the list of frequently asked questions for a quick answer to your inquiry
recent donations!
Sign-up to join our mailing list. You'll receive email notification when this site is updated. Your privacy is guaranteed; this list is not sold, shared, or used for any other purpose. Click here for more information.
To unsubscribe, click here.
August 2003 News Archive Marked -ุย as a live ending in the auto-transliteration program, which corrected the transliteration of 16 dictionary entries. Thanks to Jan Smits for this pointing out the error.
สวัสดีครับผู้นิยมภาษาไทยทุกท่าน - Hello all the Thai language enthusiasts, While Glenn had been busy getting the server issue squared away in Seattle, I had spent the fist 20 days of August working on the Dictionary database here in Minneapolis. Now it's all done. Here is all I did. First, I recorded 1,048 new sound files, including all the Thai phrases contributed by Jan Smiths. Everything can be found at the Fundamentals Category and its 10 Sub-categories. Second, I entered a total of 975 new entries, including all the submissions/corrections we received from magnoy (our champion of submissions:-), Barry Mckay, Jan Smiths, James, Alastair Munro, Michael Low and David. Thank you. Thanks also to all those who submitted their corrections & suggestions anonymously (you know who you are). Keep submitting. Rest assure that Glenn & I have received them all, and that we've done our best to enter every single one of them into our Dictionary database. Last, I created 10 new Categories and populated the existing ones with more old & new Entries both. The following are the new Categories:
Until next time. นายไบรอัน For those of you who had been experimenting with the "dynamic now online" feature, you may have noticed that you couldn't access the site during any 10-minute window when a user who had Thai in their message board username was online. This was because, when you access cookies using the ASP Response.Cookies method, there is no way to tell ASP to fetch the inherently 8-bit cookies using a specific codepage—such as 874 in our case. This caused the XML transfer to fail because the presumed-extended-Latin characters couldn't be rendered in the TIS-620 XML file. Fortunately, I discovered that this problem is not present when using the more fundamental API, Request.ServerVariables("HTTP_COOKIE"). This latter API fetches the entire unprocessed cookie, with hex coding (i.e. %A3) unconverted. So I have to do a bit more string processing myself, but at least I can control the results and specify the Thai codepage when converting to Unicode. So, you should now see Thai usernames in the now online display, although you may still have problems with the UBB display of your user name in the message board product. For a long time, I've been wondering why the Thai-language.com pages don't display Thai text properly in the Google results summary. We've always had the proper <meta> tag, but now I'm going to try adding "tis-620" to the actual HTML header, using the ASP Response object property: Response.Charset = "tis-620" which results in an HTML header like this: Content-Type: text/html; Charset=tis-620 I probably should have done this long ago. I'll keep the <meta> tag in the HTML code for now, although theoretically it should no longer be needed. As long as you use Response.Charset (as opposed to Response.AddHeader), IIS (our web server software) seems to know not to add the header twice. I've also added double quotes around "Content-Type" in the <meta> tag, and moved the the <title> tag up above <meta>, as it is in many other sites which work properly. I had previously tried to fix this by switching the charset specification from "tis-620" to "windows-874", but this didn't make a difference. We won't know if this latest attempt fixes the Google problem until a few months from now; Google will need time to re-visit and re-index our site. I am done with the site refurbishing project for now. I've loaded the site with the latest "non-debugging" versions of the components, freshly rebuilt to include all recent bug fixes and performance improvements. I am confident enough in the new builds that I also set the IIS process isolation back to "low" in order to allow for the highest performance. I hope you haven't been too frustrated with the site's recent problems. I'm quite confident that they're all straightened out now and you can now enjoy faster performance and much more reliability, effective immediately. Pretty soon, I'll be receiving a huge upload of new content from Bryan, which will include more audio clips, definitions, phrases, corrections, and more. Combining this new content with our improved reliability, the site will have gotten a huge facelift. I'll be using newly-enhanced content verification routines in our offline DBEdit tool to ensure that the new content doesn't cause new technical problems. If you're interested, you can now also see the app build time and latest process restart times at the top of the site news "info" sub-page. By way of explanation, the first time shown is the time that of the most recent modification to the actual dictionary (database) content. The "build date" is the date and time at which the main database component was compiled (converted from its source code into an executable). The "restart" time is the date and time at which the currently-running instance of the main database component started up. We also show how long the site has been running since the last restart, reboot, or crash; with our reliability improvements, this hopefully can grow to large values (but we do normally restart the site after uploading new content). I also got an inquiry on the message board about the status of the thai-language chat project. It is still coming along, but I will be putting it on hold for a little bit while I wait for the dust of the above-mentioned site cleanups to settle. Thanks again everyone for being loyal site visitors. Found and fixed yet another site-crashing bug, although this one is not so glamorously esoteric... just a simple undersized buffer. I modified the DBEdit ActiveX verification routine so that this buffer problem will be detected if it ever becomes a problem again. I tracked down one of the memory leaks. It was due to my misunderstanding of the documentation for the Win32 API function VarBstrCat. I was trying to append to a BSTR by passing the same BSTR for the 1st and 3rd argument: BSTR bs1 = SysAllocString(L"foo"); This leaks the string "foo", as there is now no longer a handle to release it. I guess I thought that an existing string passed in for the 3rd argument would be reallocated. In fact, that input is ignored and overwritten with a newly allocated BSTR. This bad code was present in the code which generates the part-of-speech for definitions and phrases. I rewrote those functions, eliminated the bug, changed the order in which multiple parts-of-speech are displayed on the site to be more logical, consolidated code, and improved the performance to boot. Rewrote some critical, frequently-used string processing routines in x86 assembly language. These functions reside in the dictionary database's ActiveX wrapper. The new versions now use more efficient algorithms too. This isn't the first assembly language I've used in this project; the most performance-critical part of the core database system (the binary-search dereferencing of an ID) was also reimplemented in hand-coded ASM about a year ago. I also consolidated disparate common code into a new function which converts a TIS620 string (our database is all TIS620 internally) into a Unicode BSTR which can be passed up to ActiveX/ASP. I'm surprised I had never separated this code out before—the new function ends up being called in twenty-five different places. In addition to the proper character mapping for Thai (performed by the Win32 API, MultiByteToWideChar), the new function can optionally perform the additional processing mentioned in the first paragraph above. And, of course, centralizing the common functionality in this way (code "normalization") makes the module less susceptible to programming errors for several reasons: I am making progress on a major overhaul and performance improvements of the old ActiveX framework which acts as the skeleton of the site's technology. For starters, I implemented the optimization mentioned in the previous log entry whereby ITypeInfo pointers are now cached. I took it one step further by caching these pointers only once per application invokation, rather than once per object creation. I am consistently improving the thread-safety of the code. For example, the typical resource releasing code changes from: if (p_obj) -or- IUnknown *punk = (IUnknown *)p_obj; to the safer: if (IUnknown *punk = (IUnknown *)InterlockedExchangePointer(&p_obj,NULL)) I've been propogating these types of small changes, cleanups, and fixes throughout all the objects that make this site hum, such as the ThaiDictionary database object, SMTPMail sender, IPNameLookup object, Xchat coordinator object, Bangkok Time reporter object, online user coordination object, ExecCGI message board wrapper object, and invasive site monitor probe object. After a week of sleuthing, I'm prepared to officially announce the fixing of a programming bug which has been intermittently crashing this web site. I was amazed to find that this was a bug in the oldest part of the code which means it's been around since the beginning of this version of the site in 2001. The details that follow are fairly technical and shouldn't be of concern to site visitors unless you're curious about our technological systems. This problem does not affect your client browser in any way—it is strictly a server-side problem which affected the server request processing which is "invisible" to you (except for the occasional server unavailability). Suffice it to know that the site will be more reliable now. This bug had to do with a "race condition" in multithreaded code, where multiple simultaneously executing threads contend to make use of a single object. In this case, the contended object was a cached handle to an "ITypeInfo library" which the operating system uses to assist ASP script in calling our custom-programmed COM/ActiveX functions. The problem was that the cached handle was not being used, even if it had previously been cached. Instead, a new handle was being created every time it was needed, irregardless of the cache (this also manifested a secondary problem where one copy of the handle was never released per each occurrence). The race condition comes into play because when one thread goes to fetch a new handle, it briefly zeros-out the target variable shortly before re-storing the new one (per standard COM programming practice, in case the fetching routine needs to return having failed). This gives a brief window of opportunity where the target variable is NULL, and another thread could possibly be trying to indirect off that target variable (in this case, trying to call AddRef, which I later changed to a more correct QueryInterface). Like most race conditions, this can only happen very rarely when the conditions of the computer's internal task switching are just so. This explains why the bug did not surface for so long and why it took me a long time to debug it. I theorize that a recent change to the site whereby the example phrases are now sorted into alphabetical order on the dictionary result pages caused enough additional server processing per request that multiple requests now more readily overlapped, exposing the race condition. Fixing the problem was straightforward. The ITypeInfo handle is now fetched only on the first occurrence, and the cached value is then used until the object is destroyed, at which time the single handle is released. The handle itself should be (multi) thread-safe; the above-described problem relates to the handle value. The elaborate debugging tools that I set up to investigate this problem allowed me to notice some significant pre-existing memory leaking which still remains after having fixed this bug. I will be looking into this problem next. I also identified another area for performance improvement—caching a single copy of the ITypeLibrary for the application lifetime. Sorry that the site had been unavailable for two days. I was away in New York all week and thus didn't know about the problem. |