This page is a tutorial on how to use the Microsoft global IME, with appropriate pictures as part of an example to demonstrate how to send an email message with simplified Chinese characters.

At first glance, entering Chinese characters from a standard keyboard would seem like an impossible task. There are literally thousands of Chinese characters which you might choose to enter, and there simply are not enough keys or combinations of keys to make it possible to easily enter Chinese characters.

An input method editor (IME) is essentially a means by which you can enter Chinese characters while using a standard vanilla U.S. keyboard. IMEs exist for all Asian languages of course, and while the general principles are the same, the details will of course be different.

The IME for Simplified Chinese that I have used essentially takes input in Pinyin. As you type, it tries to figure out what Chinese characters you really had in mind, and if you start to type a complete sentence, it may revise some of the choices from before based upon what you have subsequently typed.

I should also mention that there are other IMEs for Chinese which take other forms of input (other than Pinyin), and some of these do require special keyboards. In particular, IMEs that are used with traditional characters do not use Pinyin. They use different systems, some of which are phonetic, and some of which use the keyboard to describe strokes/radicals. Given that foreign students of Chinese typically learn Pinyin, using a simplified Chinese IME tends to be fairly easy, but using a traditional characters IME requires considerable training. For those interested in other input methods (including those used with traditional characters), has a good introduction.

In general terms, all IMEs (even those for languages such as Japanese) will take the keystrokes and process them in some way as you type. Depending upon what type of IME you are using, there might be what is called a 'pre-edit' window, often near the actual cursor position in the document you are working on, and in this window you can see the characters that you have typed so far. Once the IME thinks that it knows which Chinese characters should be used, these are then inserted in your document.

The major problem is that a Pinyin-based IME will sometimes get the wrong character. The basic theory is that the IME itself does a dictionary lookup to translate the Pinyin to the Chinese characters, and sometimes it has to guess. Newer IMEs tend to do a better job of it, apparently because they include a built-in phrase dictionary so that certain set phrases are more likely to come out correctly. Nonetheless there are often circumstances where it still gets it wrong - for this reason, when the IME inserts the Chinese characters in your document it usually indicates that those characters are a tentative choice, and this is indicated by faintly underlining the characters. In the event that it has chosen correctly, hitting the return key essentially tells the IME that everything up to that point is OK, and it should move on.

The Microsoft Global IME tends to learn phrases as you enter them, so while you may get incorrect characters the first time you enter a phrase, it will tend to do a better job of it the next time you enter the same phrase.

In the event that the IME has chosen incorrectly, you have to get the IME to show you the other choices. Here the exact method depends a bit on which IME you are using - with the IME from Microsoft for Chinese, the "Home" key on your keyboard displays the choices.

For the case of tonal languages, such as Chinese, there typically isn't an easy way to enter the diacritical markings that indicate the tone. If the IME is smart, it can work out which character you intended from the context (and the phrase dictionaries that are built into some IMEs seems to help here), but sometimes it helps to give the IME a hint. This can be done by appending a digit from 1 to 5 at the end of the word to indicate tone number. Thus you would type "wo3" for the Chinese word for "I".

In the event that you are dealing with an exceptionally stupid IME (of which I have used some), it rarely gets the correct character as you are typing, and even worse they can offer you as many as 50 possible choices. This can make entering Chinese characters tedious at best, and indicating the tone number seems to help to reduce the number of choices. I have also seen cases where the IME doesn't find the correct character when you do indicate the tone number, however. I should add that the IME that you add on to Internet Explorer seems to be one of the better ones. If you have the opportunity to install the Chinese version of Windows-NT, there is a built-in IME that comes with the system that can be quite frustrating to use due to it's extreme stupidity.

Now let's get into the specifics of how you actually use an IME. First let's cover the basics, so that you know how to turn it on and off again.

If you are running Windows (95, 98 or NT), there is a little spot on the right hand side of the task bar (next to the clock). Once you have support for more than one locale installed on your machine, you will see a little blue box with "En" in it - this indicates that the current input locale is English. Here is what it normally looks like:

Let us assume that you have an IME installed, an you are going to use Microsoft Outlook Express (which comes with Internet Explorer) to write a Chinese email. Start out as you would when you send any other piece of email, filling in the addresses. When you reach a point where you want to enter Chinese text, go down to the blue box with the "En" on it, and click on it. This should show you all of the other choices you have, and you should have a box that says something like: "Chinese(Simplified) IME". Simply select this. You may have other blue boxes with other languages - in particular, you may have a blue box that says "Zh - Chinese". Don't pick this one - it won't work as it doesn't have an IME. Here is a picture that shows what I am talking about:

I should mention that the Global IME only works with a handful of programs. This includes Microsoft Internet Explorer, Microsoft Outlook, Microsoft Outlook Express, and Microsoft Word (Word 2000 and later). It is only when one of those programs is active (the title bar for the program is blue, not gray), that you will even see the option of choosing the Chinese IME. Other programs in the Office suite (such as PowerPoint and Excel) are not capable of using the IME, however you can cut-and-paste Chinese characters into these programs from something such as Word, or even Outlook Express.

With recent versions of the IME, Microsoft has provided a hotkey that you can use to toggle the input locale without having to click on the blue box. Simply hold down the "Alt" key and press the '`' key. This doesn't work on all machines - it is a configuration setting that you have to turn on if you want to use it.

When you are done with Chinese input, simply click on the IME icon on the task bar, and select the blue box with "En" in it. This will bring you back to English input.

When your input locale is set for Chinese, a little window will appear somewhere on the screen, and this window has several buttons. This is what could be considered the control panel for the IME. Here is what the IME control panel looks like:

Unfortunately I see some fairly significant differences in appearance between the Windows-2000 version of the IME, and the IME that gets installed on Windows-NT/Windows-9x. Perhaps the most significant difference is that the NT/95 version has online documentation in English, and the Windows-2000 version doesn't. To get to the help, put the mouse cursor over the IME control window, and right click, and then pick the first option. A help window should appear that should explain everything.

The leftmost button is usually used to select whether you want English or Chinese input. It should have either the "zhong" or a "ying" character in it. You just click it to toggle back and forth. In English mode, it looks like this:

Before we start entering Chinese characters, you probably want to adjust things so that the characters appear larger on the screen. The reason for this is that while English text is quite readable in a 10 point font, Chinese can be a bit hard to read. I prefer to bump the size up to either a 12 or 14 point font to make it more readable. Changing the size is really easy - just make this change:

At this point, let's say that you have the IME in Chinese mode, and you are ready to start entering Chinese characters. All you really need to do is start typing in pinyin. It looks something like this:

When you press the spacebar, the pre-edit window will disappear, and you will have just the tentative choices displayed. Just press return if the tentative choices were correct, and press the "Home" key on the keyboard if the tentative choices are incorrect and you need to pick the right character by hand. This process looks something like:

Once you have finished entering the text of your message, there are a couple of things you must do before you send it. The first thing you must do is to make sure the email message is sent in HTML format. Here is a picture which shows how you change this setting:

Finally, you need to make sure that the email message is marked as being something that should be displayed in Chinese. I realize that this seems a bit redundant - you just got through entering Chinese characters, but there are technical reasons why you need to do this. Here is a picture that shows how you change the locale:

The major reason for doing this is so that when the message is received that the mail reader program that they are using will realize that the message contains Chinese characters, and they should be automatically displayed correctly (assuming the person who is reading the mail message is using a mail program that is aware of how to display Chinese).

One common problem area with the Microsoft Global IME is entering characters which would be spelled in Pinyin using an "u" with an umlaut. It took many tries, and a search of the web before I turned up the answer. Simply use a "v" instead. Many thanks to Betsy (Luebbe) Garrett and Zev Handel for making this tip available on the web.

A couple of points in closing. Taiwan and the People's Republic of China use different ways of encoding Chinese characters. In Taiwan (and in Hong Kong, for that matter) a system known as "Big5" is used, and it is assumed that you want traditional characters when Big5 is used. On the mainland, they use a different standard called GB2312 (GB is short for Guojia Biaozhun - International Standard), and here the simplified characters are assumed. Why two different standards? From what I gather, politics is the major reason.

This point is important in a couple of places. First of all, if you start to browse Chinese web sites, you might see a place where you can choose between English, GB and Big5. One example that comes to mind is where you are offered the choice of GB, Big5 or English right at the start.

The second place where this may come into play is if you were using Traditional characters along with a Traditional character IME, and you were preparing an email message to send to someone in Taiwan, you might want to choose Big5 as the encoding, and not GB2312.

In the event that you wanted to send an email message that contained *both* Traditional and Simplified characters, things are a little trickier. The main difference is that for an encoding you would select "Unicode (UTF-8)" instead of GB2312 or Big5.

For those of you that care, Unicode is intended to be an international standard that encompasses all of the different character sets that might be used throughout the world. Thus Unicode contains not only all Chinese characters (both simplified and traditional), Japanese characters, and Egyptian hieroglyphs. It also contains even more bizarre things like runic symbols that would be found in Celtic ruins.

There are cases on web sites where people wish to display Chinese characters without requiring that the people viewing the website have Chinese fonts installed. In such cases, people typically include a picture of the Chinese character in the web page. The major disadvantage of this is that viewing web pages with many pictures is considerably slower than viewing web pages that just have the Chinese text in GB2312 or Big5.

In the above example, I walked you through how you send a Chinese email. If you wanted to use Microsoft Word to write a Chinese document the principles are similar. I can come up with a short tutorial on that one too if people are interested.

Finally, you may run across Chinese web pages that don't display correctly even though you may have the correct fonts installed. There is a possibility that the author of the web page didn't set the correct attribute for the page, and thus your browser doesn't think that it has Chinese characters. In the browser you can also set the encoding to GB2312, in the same way you did when putting together the email message.

For people more interested in technical details (and for details of how to use non-U.S. keyboards, or a traditional character IME) I can recommend the book "CJKV Information Processing", by Ken Lunde, Published by O'Reilly, ISBN 1-56592-224-7. This book is actually a reference on Asian typography, and half the book is just tables that show the encoding of all Asian characters. In this case, CJKV stands for "Chinese, Japanese, Korean and Vietnamese".