Lesson Learned

CDATA Section Delimitosis

Delimitosis = disease pertaining to delimiter

I don't know if it is just because I am a parser-minded person, but the first time I learned about CDATA Sections a warning buzzer went off in my head and has been ringing ever since. It is saying: What if ]]> happens to be in the data you put into a CDATA Section?

Well obviously it is not allowed. Hmmm. But that is not very helpful is it? Does than mean I am supposed to check to see if my text contains ]]> every time I want to use a CDATA Section? And what should I do if it does?

I want to settle some of the unsettling issues about CDATA Sections here.

A Productivity Lesson from The Shining

Joel on Software readers will be well aware of the private office idea. Joel believes developers should be given private, quiet working conditions, and I'm sure that anyone who's worked in a shared office will have no difficulty understanding why.

The basic concept is that knowledge workers produce their best work "in the zone", and that getting there is difficult and takes time. Each interruption, no matter how minor, will pull you out of the zone, and it will take you time to get back there afterwards.

Moving Forward

In the Micro ISV Mistake articles, I outlined the main areas where I went wrong when starting my business and developing my products. I wasn’t just wallowing, by writing about and exploring the problems, I was trying to identify and plan a way forward.

Obviously, I haven’t managed a complete turn around in just the last couple of months. I realised when writing the original articles that there were some things I was going to have to learn to live with, and that I would need to take a long term view on others. For example, there’s no way to get back time and momentum squandered during the early stages, and a product with too general a market can’t be tailored to a niche overnight.

But, I think enough time has passed, and I’ve made enough changes, to start writing those updates I promised.

Fonts To Simulate Charsets

I want to know as little about fonts as I can get away with, but I recently saw a modified font being used to obtain DOS-style box drawing characters like ┬──┘. The text was in an old PC Code Page, probably IBM437 also called "OEM United States." So rather than convert it to Unicode, it was left in the IBM437 single byte encoding and viewed with a font that replaced certain characters with box drawing characters.

Lets take the character and follow it through this process. The byte value d9 (217) represents this character in IBM437. When the browser or the OS treats this byte as if it were in Windows-1252, it converts it to Unicode U+00d9 LATIN CAPITAL LETTER U WITH GRAVE and it would normally be displayed as Ù. But in our special hacked font, the character mapped to $00d9 actually looks like the box drawing character .

Phantom Currency Signs in Japan and Korea

If you're not from Japan or Korea, you might be surprised that when you reboot your Windows OS in the Japanese language for non-Unicode programs (system locale) your backslashes are no longer backslashes; they are yen signs. Well, don't worry, they are still backslashs, they are just displayed and printed differently by many of the fonts in the Japanese locale. But there is a more troubling internationalization issue: Unicode text coming out of Japan and Korea sources may have a backslash where you would expect a yen sign ¥ or won sign ₩. This whole subject has been discussed elsewhere (references below) but I talk about the need to repair the Unicode text.

Where the backslash issue gets interesting is in the encoding conversion between the locale code pages and Unicode. While 0x5c is clearly the yen sign in the Japanese code page 932 (Shift-JIS), it is converted to the Unicode U+005c REVERSE SOLIDUS (backslash) rather than the U+00a5 YEN SIGN. Similarly in the Korean code page 936, 0x5c is clearly the won sign but it is converted to the Unicode backslash rather than the U+20a9 WON SIGN.

Importing Legacy Data

I came across a great post recently about Legacy Data: Import Early, Import Often and it really struck a chord.

The author is completely correct that importing old data is normally considered a last step in the process of a implementing a new project. Most developers love to start with a fresh clean codebase, whiteboard, database, etc and build their projects from the ground up. It is a wonderful feeling starting with a blank slate and actually making something from nothing. Generally, it is much less satisfying to take a (mostly) functional codebase, learn about it, dig through its oddities, and expand or fix the features. I've talked about this tendency before in Scrapping It All vs A Salvage Operation, but I thought it needed some expansion.

Simple AJAX or How I learned to start simple

I've been using the iframe data loading trick for a few years now. I first saw a use of HTTPXmlRequest about a year ago for a simple search engine which worked like Google Suggest. I thought that was cool, but was unsure if this was a feature that would remain in the browser or just a hack someone figured out which might be gone in the next release. When this methodology was finally named this year "AJAX" and everybody got all excited about it, I thought I should try some of this stuff. I followed this tutorial over at O'Reilly and saw this example over at Apple. I downloaded a few "packages" for php, javascript and ajax. I found them somewhat complex. Then thought I should make my own at least at first. I immediately started refactoring in my head how to make this a neat, packaged javascript class. I got lost because I started out too complex. I cut and pasted a bunch of code, added my own code. Tried running it, got strange errors and kinda gave up (at that point).

The Euro Sign Predicament

To this day, you will rarely see the euro sign in newsprint or news sites, it will be written out or abbreviated as EUR. This is one of those interesting cases where computer text encoding systems have influenced the way information is presented in the media.

The ASCII factor

One reason the symbol is not used is that it is not ASCII, and whenever you depart from ASCII you risk ending up with corrupted characters. Similarly, the pound and yen appear written out rather than using the £ and ¥ symbols, although those have different reasons too, and longer histories.

How I saved my company one penny

Somedays as a developer you really know that your work affects the bottom line, and other days you are putting lipstick on a pig. This is my story of how I once saved my company a whole penny through a day of in the trenches code sleuthing. Money machine or hog hairdresser? You decide.

How to work for free and keep your sanity

If you make websites, surely you've had some friend or relative say "Hey, can you make a site for my non-profit group?" … you think, I'm a nice person and this cause is just. And here's some extra practice and something to put on my portfolio, SURE! Not a problem. I have recently had a frustrating experience with something like this and I've learned a few things…