Character Coding

In this section we discuss the most boring, technical aspect of working with Thai on computers; also Sontana's most powerful feature.

All You Need to Know About Character Coding

There are 3 common types of coding and one cross coding that is often used accidentally. Using the wrong coding is the cause of 99% of user problems with reading and writing Thai email. Please see my Thai email How To for more background information on dealing with these problems.

Automatic Coding Conversion

When you load a file, insert a file or paste some text into Sontana it can automatically recognise the format.

This is the default setting. If Sontana failed to recognise the text then you will need to manually set the input coding from this dialogue.

Coding Settings

If the problem occured when loading or inserting a file then select Current File Info from the File menu to get Sontana's analysis of your file. If you still cannot decode your file, send me a copy so I can analyse the format.

File Information

At this stage, I just need to remind you that Sontana is a text editor and not a word processor. It will not read documents created by "office" software unless they are plain text format (sometimes identified by a .txt extension).

Output Format

Setting the output coding is a choice that Sontana cannot make automatically. The majority of Thai people use TIS-620 when using email. UTF-8 Thai is a more universal format, as is HTML Unicode. UTF-8 Latin is only for experts.

X11 users: The selection clipboard (accessed by the middle mouse button) is always copied from Sontana as UTF-8 Thai.

Controlling the Clipboard

Input coding also applies to anything pasted from the clipboard. The output coding also applies to the clipboard if you use Coded Copy; text will be recoded when it is placed on the clipboard:

  1. Select an output coding format in the Character Coding dialogue above.
  2. Select the text.
  3. Choose Coded Copy from the Edit menu.