How-To

How to Determine Text File Encoding

With the explosion of international text resources brought by the Internet, the standards for determining file encodings have become more important. This is my attempt at making the text file encoding issues digestible by leaving out some of the unimportant anecdotal stuff. I'm also calling attention to blunders in the MSDN docs.

For Unicode files, the BOM ("Byte Order Mark" also called the signature or preamble) is a set of 2 or so bytes at the beginning used to indicate the type of Unicode encoding. The key to the BOM is that it is generally not included with the content of the file when the file's text is loaded into memory, but it may be used to affect how the file is loaded into memory. Here are the most important BOMs and the encodings they indicate:

Objects - Ruby vs. PHP

I was teaching my husband some programming in PHP (meanwhile, I’m learning Ruby) and wanted to find something he was interested in to use for examples. He is an avid player of the game VS; and I have resigned to a "if you can't beat 'em, join 'em" mentality and started to play too (Geez, I was tired of being ignored!). It’s a pretty complex game, but kind of fun once you learn all the rules.

I found it to be a pretty good problem area for teaching. There are plenty of if/else statements to learn about conditional statements, operators and entities that can serve as objects and interaction between card, player, opponent etc.

I’ll keep it simple, there are few more attributes I could put in this class, but this suffices for illustration of Ruby and PHP.

A basic card has a name, cost, text and id string.

My office needs an ice cream truck!

Ah yes, summer is coming to an end, but those ice cream trucks still come around sometime in the early afternoon. Yeah, I typically notice them on a weekend, at home, just when I am trying to read a good book. They are loud, their music annoying at best and they tend to hang around for too long, waiting for kids to bring their parents' cash in exchange for some over-priced ice cream. The music keeps going while they wait. It's really just about impossible to focus on anything till that truck has moved on.

I want one of those to stop at the office. Every day!

Alright, alright, someone is clearly losing it here. This is obviously a ridiculous idea. We cannot possibly even think about taking this suggestion seriously. The noise is disruptive and loud, nobody will be able to get any work done. Meetings get interrupted, people will goof off taking their ice cream breaks. I just really don't get, why anyone would even think about - Wait, stop right there.

I realize, it sounds crazy.

Double-Byte Safety Primer

A lot of incorrect string processing goes unnoticed when your software only deals with single byte character sets. Once your software gets used in a Far Eastern locale, there are numerous new problems that can show up, but might still be rare. Understanding the underlying multi-byte string function issues goes a long way to getting your software ship-shape.

I'm not talking about localization, I am talking about just simply having your English program work on Japanese or Chinese Windows. In other words, even though all of your menus, dialogs and message boxes are still in English, you might have bugs because of Far Eastern pathnames and other text that affects your program.

Dynamically Generated SQL Stored Procedures

One of my favorite MSDN articles has been the key to saving countless tedious hours manually creating the select, insert, update, and delete stored procedures in SQL Server 2000 for web applications: Peter W. DeBetta and J. Byer Hill, MSDN April 2003, Automate the Generation of Stored Procedures for Your Database.

Some developers prefer not to use stored procedures for various reasons, but I agree with Douglas Reilly who essentially concludes that if you don't need to worry about switching from SQL Server to another RDBMS, and if some of your procs have complicated processing in them, then generally it is advantageous to use SQL stored procedures over ad hoc SQL.

The secret family split in Windows code page functions

My earlier post "Strange case of two system locale ANSI charsets" discussed the confusion between the default system locale (GetACP, Language for non-Unicode Programs) and the default user locale (setlocale, Standards and Formats). There I mentioned a problem with setting the system code page in C/C++ using setlocale, but that is only the first clue in what reveals a secret split in the family of locale-based charset functions.

Data Access Objects - Rails Style

Most other DAOs you have to create some configuration file, whether xml or ini files or created manually. Example, to setup Pear's DB_DataObject, you must create an ini file and run a script "createTables.php" each time your database changes. Not so with RoR! its automagically created and updated for you. It does ALL the basic CRUD for you. Just have to specify the relationships (if any).

Say you have tables:

CREATE TABLE people (
id int(11) NOT NULL auto_increment,
name varchar(255),
PRIMARY KEY (id)
);


CREATE TABLE companies (
id int(11) NOT NULL auto_increment,
name varchar(255),
PRIMARY KEY (id)
);

Naming Convention
By the way, RoR has a naming convention for tables. You name the table the plural version of the word. The primary key has to be ID in each table, foreign keys are _id. EGADS! It almost looks like English, huh? An average half-way intelligent person could even read it and have a clue of what’s going on. Even pointy haired bosses. Though, in traditional fashion, some people have to complain about having to name their tables a certain way. (violin music playing) There is a way to turn off "plural/singular" in the settings. So if you just have to be difficult, here ya go.