Fonts To Simulate Charsets

I want to know as little about fonts as I can get away with, but I recently saw a modified font being used to obtain DOS-style box drawing characters like ┬──┘. The text was in an old PC Code Page, probably IBM437 also called "OEM United States." So rather than convert it to Unicode, it was left in the IBM437 single byte encoding and viewed with a font that replaced certain characters with box drawing characters.

Lets take the character and follow it through this process. The byte value d9 (217) represents this character in IBM437. When the browser or the OS treats this byte as if it were in Windows-1252, it converts it to Unicode U+00d9 LATIN CAPITAL LETTER U WITH GRAVE and it would normally be displayed as Ù. But in our special hacked font, the character mapped to $00d9 actually looks like the box drawing character .

This font hack sounds simpler than it actually is. First of all, it is dependent on having the special font installed and specified for the relevant text. But less obviously, it relies on an intermediate code page Windows-1252 or Latin-1 to ensure that a character code such as d9 translates to Unicode U+00d9 for the sake of the font mapping. So if you are using Cyrillic or Greek Windows and you do not somehow specify Windows-1252 or Latin-1 encoding, this font hack will not work (or at least you'll need a different font for every different possible system locale code page). In Greek Windows-1253, d9 translates to Unicode U+03a9, not U+00d9 so the font hack would be broken.

In my case, the font hack I was looking at was in an HTML page. The charset was set to Windows-1252 and in a normal text editor the box drawing characters looked like ÂÄÄÙ. The special font face was specified in the style for the part of the page containing the box drawing characters to make it appear as ┬──┘. But to do this in an edit box or other windows control in your program, you could specify the font, but again it would only work in the Windows system locale code page 1252.

In an HTML page there are a number of options that would not require a font hack. You might be able to simply set the charset to IBM437, if that charset is sufficient for any other characters in the page and all of your browser clients support it. You can also convert the text to UTF-8 and specify UTF-8 charset. Or you can use numeric character references like ┘. In Unicode, the box drawing characters are in the range 2500-2552 and the d9 character in IBM437 is known as U+2518 BOX DRAWINGS LIGHT UP AND LEFT, etc.

In a non-Unicode program, not using an HTML-view, the options for displaying multiple languages from multiple character sets are more limited. You can use a UTF-8 control in a non-Unicode program and then convert your text to Unicode for viewing in the UTF-8 control. You should also look into options for changing your program to Unicode. A font hack like the one used for box drawing characters is an option, though not recommended. Someone asked about this on JoS in Displaying non-english character-based text without unicode?

If I want to display Greek or Russian ... is it sufficient to just set the font script to the appropriate language for that control?

First of all be clear that you need to display Cyrillic characters at the same time as Greek characters. If you only need to display Greek while on the Greek computer and Cyrillic while on the Russian computer then the Windows OS already supports your needs without any font hacking or special tricks. You generally have a separate set of resources for the different language installs, and select it at installation time.

For multiple languages from multiple character sets, Unicode is the way to go.