That Ol' OEM Code Page

If you have a regular U.S. or Western European (Windows-1252) system locale code page, try this:

  1. copy and paste these 4 characters ÂÄÒÙ into notepad
  2. save it as oem.txt
  3. open a DOS window and cd to the directory where you saved it
  4. enter: type oem.txt

Do you see ┬─╥┘ instead of ÂÄÒÙ? Why? That is because your system local "ANSI" code page is different than your DOS "OEM" code page.

So what is going on? Well it is very simple actually. When you copied and pasted those characters from your browser to notepad they were encoded in the Windows clipboard Unicode text format UTF-16 i.e. four 16-bit values 00c2 00c4 00d2 00d9. Internally, notepad kept it in Unicode, but when it saved the file, it saved it in your default system locale ("ANSI") code page Windows-1252 i.e. four 8-bit values (bytes) c2 c4 d2 d9. The byte values happen to be the same as the Unicode values, but this is only because many of the Windows-1252 codes happen to be the same as the Unicode code points, but it is not always true (n.b. the euro is 80 in Windows-1252, but 20ac in UCS-2 -- U+0080 is not the Euro!).

Now you've got a file called oem.txt containing these four bytes: c2 c4 d2 d9, and any program can interpret them however they like. The DOS type command puts the text into the screen memory. The DOS console screen displays the text according to the default system OEM code page which is OEM 437 (also known as "IBM437"):

  • C2 = U+252C : BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
  • C4 = U+2500 : BOX DRAWINGS LIGHT HORIZONTAL
  • D2 = U+2565 : BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
  • D9 = U+2518 : BOX DRAWINGS LIGHT UP AND LEFT

For comparison, here is Windows 1252 for the same character codes:

  • C2 = U+00C2 : LATIN CAPITAL LETTER A WITH CIRCUMFLEX
  • C4 = U+00C4 : LATIN CAPITAL LETTER A WITH DIAERESIS
  • D2 = U+00D2 : LATIN CAPITAL LETTER O WITH GRAVE
  • D9 = U+00D9 : LATIN CAPITAL LETTER U WITH GRAVE

Back in the DOS days we used the box drawing characters to draw borders around menus and message boxes, even to draw "graphical" things in text-based games. So for DOS software to keep working, that same character set needs to be in place. But in Windows, support for Western European languages was much more important because lines and boxes could be drawn with pixel graphics. Windows-1252 (and the Latin-1 it is based on) supports about 20 languages from Swedish to Afrikaans.

But the box drawing characters were so important in DOS that they tried to include most of them across all the OEM code pages. However, in the Multilingual code page 850 d2 was used for Ê instead of which was probably considered less essential because it joined single and double lines. Here is OEM 850 showing the same characters as OEM 437 except d2:

  • C2 = U+252C : BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
  • C4 = U+2500 : BOX DRAWINGS LIGHT HORIZONTAL
  • D2 = U+00CA : LATIN CAPITAL LETTER E WITH CIRCUMFLEX
  • D9 = U+2518 : BOX DRAWINGS LIGHT UP AND LEFT

So if your OEM code page was 850 instead of 437, our oem.txt example would type out to the screen as ┬─Ê┘.

The command line program chcp will display the OEM code page. You can also specify a code page number and change the OEM code page, but read the remarks in the documentation about that. I found that I could only see changes between single byte code pages if the console was set to Full Screen. The Japanese double byte code page 932 (only available when you have installed support for East Asian languages) did not require Full Screen but chcp clears the console and changes the font when switching.