Reading Thai Email

You received an email in Thai; you go to read it but you don't see Thai letters. OK, you need to set the right coding. As everyone knows, computers represent everything with numbers. The representation (code) used to represent Thai can be one of several systems.

If you have any hope of reading Thai email with your client or in your web browser you will need to be able to change the coding. For example, in Thunderbird select View/Character Coding/More Encodings/South East Asia/Thai and choose an appropriate coding or use a recently used coding. Thunderbird will automatically switch to a font that uses that coding and it will be possible to view your email.

Thunderbird main window
Selecting view mode for Thai email.

If you can't change the coding then don't worry. You can copy and paste the text into Sontana and it will automatically convert it into something you can read. There are some situations that Sontana can't cope with and never will - they are far too convoluted to even want to support; I'll explain in more detail later.

Recognising the Coding

Email written in Thai is simply a text file with a special encoding to accommodate Thai letters. There are 3 types of coding in usage:

  1. Thai Institute of Standards TIS-620
  2. Unicode UTF-8
  3. Unicode in HTML

Below you can see what the 3 types look like on non-Thai systems before decoding them properly,

TIS-620 UTF-8 Unicode in HTML
tis-620 encoded email example utf-8 encoded email example html encoded email example
Thai characters appear as "Latin" characters and less commonly used punctuation. Due to Unicode multibyte coding, Thai characters appear as triplets of an a with a grave accent (à) plus two other characters of the style in TIS-620. Thai letters are represented by ampersand - hash characters with a 4 digit number starting with a 3.

Identifying Thai Email

To properly identify email as Thai it would contain a line for the MIME type. MIME is short for Multipurpose Internet Mail Extensions. There should be a line starting Content-Type hidden in the header; the header information is the email equivalent to the address on the outside of the envelope.

Content-Type: text/plain; charset=tis-620

for TIS-620 email or

Content-Type: text/plain; charset=utf-8

for UTF-8 email (which could be in any language, not only Thai).

Using the MIME header should be the standard way of identifying Thai email, however, many systems assume that if you are sending an email in Thai then the recipient must understand Thai. Most emails don't have one of these headers, webmail usually doesn't.

Decoding the Email

If the email you received looks like one of the above, then you're in luck! You can switch the coding you view to the correct one. Or you can copy the text into Sontana.

Now we have to talk about transcoded email, the curse of webmail. Most webmail systems don't specify the coding type and some people don't set it either. Without any information about the type of coding, the receiving end sometimes automatically converts it to the coding it thinks it received.

How does this work ? If, for example, I send a TIS-620 encoded email from a computer in Japan without specifying the type then at the other end the receiver will look up my IP address, see I am in Japan and convert the email to Unicode assuming that I sent it with a Japanese coding like ISO-2022-JP. Furthermore, sometimes webmail will convert it and sometimes they don't. If they do, they don't consistently choose the same coding.

This is bad, but it seems to be that both Hotmail and Yahoo fiddle with your email this way. Another example situation is that the TIS-620 encoding immediately gets converted to Unicode but not Thai. Those accented characters are converted to their equivalent in Unicode. Fortunately Sontana can automatically detect that (I call this "UTF-8 Latin") but there are too many combinations of transcoding to be able to sensibly deal with. How about if I send an email in Thai to my friend in Korea while I am touring Laos ? You can see that this will quickly become a nightmare.

Hotmail transcoded Thai into Japansese!
Hotmail transcoded my email!

By the way; that Unicode in HTML will be quoted by Hotmail, so you see the &#... etc. Again, copy and paste it into Sontana or paste it into an HTML file and view it with your browser.

Next, we look at writing Thai email.