How-To

Fonts To Simulate Charsets

I want to know as little about fonts as I can get away with, but I recently saw a modified font being used to obtain DOS-style box drawing characters like ┬──┘. The text was in an old PC Code Page, probably IBM437 also called "OEM United States." So rather than convert it to Unicode, it was left in the IBM437 single byte encoding and viewed with a font that replaced certain characters with box drawing characters.

Lets take the character and follow it through this process. The byte value d9 (217) represents this character in IBM437. When the browser or the OS treats this byte as if it were in Windows-1252, it converts it to Unicode U+00d9 LATIN CAPITAL LETTER U WITH GRAVE and it would normally be displayed as Ù. But in our special hacked font, the character mapped to $00d9 actually looks like the box drawing character .

Poking around in Ruby

Ruby for Windows (I'm not sure if other operating systems have similar, but I would think so) has a program called "fxri - Interactive Ruby Console and Help." This little application consists of a frameset with one panel as a documentation browser, one panel to display the help for the currently selected item and an interactive console. This can be a great aid in learning ruby and for testing out functions or code. Here I'll talk about a few things I've learned with this.

You get a prompt as you would with an operating system, looks something like this:
irb(main):001:0>

Phantom Currency Signs in Japan and Korea

If you're not from Japan or Korea, you might be surprised that when you reboot your Windows OS in the Japanese language for non-Unicode programs (system locale) your backslashes are no longer backslashes; they are yen signs. Well, don't worry, they are still backslashs, they are just displayed and printed differently by many of the fonts in the Japanese locale. But there is a more troubling internationalization issue: Unicode text coming out of Japan and Korea sources may have a backslash where you would expect a yen sign ¥ or won sign ₩. This whole subject has been discussed elsewhere (references below) but I talk about the need to repair the Unicode text.

Where the backslash issue gets interesting is in the encoding conversion between the locale code pages and Unicode. While 0x5c is clearly the yen sign in the Japanese code page 932 (Shift-JIS), it is converted to the Unicode U+005c REVERSE SOLIDUS (backslash) rather than the U+00a5 YEN SIGN. Similarly in the Korean code page 936, 0x5c is clearly the won sign but it is converted to the Unicode backslash rather than the U+20a9 WON SIGN.

Simple AJAX or How I learned to start simple

I've been using the iframe data loading trick for a few years now. I first saw a use of HTTPXmlRequest about a year ago for a simple search engine which worked like Google Suggest. I thought that was cool, but was unsure if this was a feature that would remain in the browser or just a hack someone figured out which might be gone in the next release. When this methodology was finally named this year "AJAX" and everybody got all excited about it, I thought I should try some of this stuff. I followed this tutorial over at O'Reilly and saw this example over at Apple. I downloaded a few "packages" for php, javascript and ajax. I found them somewhat complex. Then thought I should make my own at least at first. I immediately started refactoring in my head how to make this a neat, packaged javascript class. I got lost because I started out too complex. I cut and pasted a bunch of code, added my own code. Tried running it, got strange errors and kinda gave up (at that point).

Excel Compatible HTML

For the last couple of weeks I’ve been talking about adding an Excel export feature to one of my C#-based products. If you missed the previous posts, you might want to read the introduction, and last week’s What I did; an account of the options I chose, combined with an honest assessment of the decision process.

The bulk of the export feature was eventually provided by a third party component, xPort Tools, but I also used Excel compatible HTML files for one of the exports. Today, I’m going to finish off the series by explaining how to use HTML to output formatted data to Microsoft Excel.

MySQL Encoding and Mojibake

As a follow-up to my last post about Mojibake character encoding corruption I want to distinguish "intermediate encoding corruption."

In a post on the JoelOnSoftware Discussion forum someone asked why about 50% of the characters in his UTF-8 strings inserted into a MySQL database were getting corrupted (and 50% weren't). This is very typical intermediate encoding corruption where some characters are corrupted while others survive.

The Power of the Lambda

In one of the (non-Ruby) applications I maintain, there is a function that is responsible for handling unit conversions. It looks something like this:

double UnitConvert(double value, string from_unit, string to_unit)

So that I can do this:

double value = UnitConvert(5.0, "feet", "inches")

The underlying part of this code has to figure out exactly how to convert between the two units. In a nutshell, there's a big hash of known unit conversions that gets loaded when the program starts up, and it can interpolate, trace paths, and figure out how to fill in any gaps that may exist. In all actuality, it's a pretty smart piece of code.

Oh No! Mojibake!

Say "moajee bockay!" (MOH-JEE-BAH-KAY) when you've got a string of characters displaying incorrectly all scrambled and corrupted. It is a great exclamation word like Eureka! and Geronimo! (but likely to never gain broad usage outside of the programmer community of course!). The Wikipedia entry for Mojibake 文字化け gives a good definition:

Mojibake is a Japanese loanword which refers to the incorrect, unreadable characters shown when a piece of computer software fails to render a text correctly according to its character encoding.

got something done

A meeting in a stuffy conference room, over lunch, no less. Issues were raised, some fingers were pointed, solutions were discussed. Nothing too unusual really, until the end, when the host cleared his throat and inquired: Alright, so what's the next action?

This, right then and there made my day.

Attribute Accessors - Ruby VS. PHP

Typically in objects you don't set/get the attributes directly, you use a method. This allows you to do type checking when setting the data or to format it a certain way when getting the data. In php I would use something like this:

    function setAttack( $atk ) { $this->attack = $atk; }
    function getAttack() { return $this->attack; }

And used like this:

$testcard->setAttack(4);
echo $testcard->getAttack();