The Euro Sign Predicament

To this day, you will rarely see the euro sign in newsprint or news sites, it will be written out or abbreviated as EUR. This is one of those interesting cases where computer text encoding systems have influenced the way information is presented in the media.

The ASCII factor

One reason the symbol is not used is that it is not ASCII, and whenever you depart from ASCII you risk ending up with corrupted characters. Similarly, the pound and yen appear written out rather than using the £ and ¥ symbols, although those have different reasons too, and longer histories.

When the Associated Press releases an article, it is instantly shared across the globe in numerous ways. The top tier of AP customers receive media (text, photos, videos) via equipment and/or software provided by AP. At that point the text encoding would still be under quality control of the AP assuming they had good control over all of their sources.

However, as soon as text is applied to external systems (documents, messages, databases), subtle corruption problems can start happening and are much more likely if the text is not ASCII. This applies to Reuters and all the major News Agencies operating primarily in English.

Well of course it shouldn't be that way, but the reality on the ground is that encoding corruption is very very common. In this day and age, text get's passed through many many hands on its way to being published in various mediums, and one software bug or configuration incompatibility along the way is likely to screw up the non-ASCII characters.

The Latin-1 factor

There is another reason the euro is particularly prone to corruption and that is because it doesn't exist in Latin-1 (ISO 8859-1) which happens to be the most common single byte character set outside of ASCII and Windows-1252 (not to mention the basis for the lower 255 code points of Unicode). Oddly enough, Windows-1252 (which was originally based on Latin-1) actually has the euro symbol!

You'll recall the European Union was switching to the euro currency in the 90s. A new encoding called ISO-8859-15 (aka "Latin9" and "Latin0") was created to support the euro plus some other improvements. Windows-125x code pages added the euro in the late 90s (cannot find a definitive source on when exactly). Why could Windows-1252 add it when Latin-1 couldn't? Because Windows-1252 is proprietary (Microsoft), and also because of convenient unused values.

The ISO-8859 "Latin" charsets cordoned off the values between 80 and a0 for control codes, and as far as I can tell this was a useless thing to do but it is probably important now for backwards compatibility - except that some call Windows-1252 a "superset" of Latin-1 and MySQL has gone so far as to use "latin1" to mean Windows-1252 |-{. However, Windows-1252 had nothing against using those values between 80 and a0, and happily assigned the unused value 80 for the euro symbol.

Anyway, Latin-1 and Windows-1252 are the same for most all of the accented letters like å ë ñ used for Western European languages like Spanish and German. Due to this partial compatibility, a lot of text can be mistakenly handled as either of these encodings without any corruption. But since the euro symbol is not in Latin-1, it is more problematic than those other characters.

The references factor

Now you might think escaping the euro symbol with a numeric character reference or an HTML entity reference € would solve this problem in cases where the markup standard is supported (XML and HTML).

But the reality is that even within homogeneous systems like an ASP.NET server, programs mess these up all the time. For example, Larry Osterman's WebLog had a rash of problems with double-escaped ampersands in his (I think Community Server based) blogging system earlier this year.

This is even more to be expected across loosely integrated systems because escaping and unescaping special characters is prone to bugs. Compilers and other development platforms do not know whether a string of text is escaped or not; it is highly dependent on the programmers not to make a mistake and you know what that means.

The Unicode factor

The euro sign was only added in Unicode 2.1 (in 1999, though it was likely established earlier). It is U+20AC EURO SIGN, the new single currency for member countries of the European Monetary Union (EMU) and is not to be confused with the pre-existing U+20A0 EURO-CURRENCY SIGN of minor historical interest.

As Unicode gains broader usage, its most common encodings UTF-8 and UTF-16 can coincide more easily than non-Unicode encodings in which partial compatibility allows bugs to go initially unnoticed. Still, the most reliable Unicode ranges will be limited to those supported in UCS-2 and UTF-8 up to 3 bytes, and without some features of Unicode.

Until some lowest common denominator of Unicode is supported everywhere, news agencies will keep depending on ASCII as the only reliable way to get out the news globally. So it will be a long time before the euro sign starts casually slipping in here and there.