Testing

Your Own Test Lab, Minus the Lab and the PCs

Ever need to see if your web site or application works properly under Firefox on Linux? IE on Macintosh? Wide-screen resolution on Windows XP?

If you're a small development shop, you probably can't afford to buy all of the hardware and software configurations you'd need. While there are virtualization solutions like VirtualPC or VMWare, you'd also need to buy and install each OS platform before you can even start. Oh, and you'll need 2GB of RAM to run these at better than a crawl.

Enter BrowserCam, a web-based service that offers several ways to test your web apps or site with a minimum of fuss.

The basic service lets you automatically take snapshots of your website on a variety of platforms (from Windows 98 to Mac OSX to Linux), and a variety of browsers (Firefox, Netscape, IE, Safari, Konqueror, etc.). Just enter a URL, and the service does the rest.

If you need more hands-on testing, a Remote Access service is available to play with your application on all of these platforms. This requires a VNC client or you can use their built-in Java VNC client from your Web browser.

Testing with Selenium TestRunner

I know, I know its been two weeks since I posted. . . but I hope this makes it worth the wait!

I've looked at Selenium a few times but just couldn't fathom how it could possibly work -- and with all javascript?? I went to a Ruby "HackFest" here in my beautiful city of Chicago and met one of the developers of Selenium , Jason. He gave a demonstration of Selenium testing itself and I saw how it worked, but how do I get it to work? As it turned out, in my traditional fashion, I was making it harder than it really was.

Splitting Surrogate Pairs

Microsoft chose UCS-2 for its Unicode encoding system when it seemed like a nice and simple fixed size per character; then Unicode promptly outgrew UCS-2. As I said in UTF-8 Versus Windows UNICODE, the early impression of simplicity, in comparison to UTF-8 multibyte encoding, backfired.

What are surrogate pairs?

Surrogate pairs are UTF-16's answer to multibyte encoding. Basically, in the UTF-16 encoding system a Unicode character can be encoded in either one or two 16-bit values; if it is two 16-bit values it is utilizing a "surrogate pair". Surrogate pairs are simple yet they inevitably lead to a great deal of confusion.

The Enigma of Encoding Versions

The Enigma Machine was used to encrypt wireless messages by the German regime before and during the Second World War. It was very significant in the Allied victory that they were able not only to decipher the German Enigma's encryption, but to mostly keep it a secret that they had deciphered it. Many soldiers and ships were sacrificed to keep it a secret because the Allies did not want to act on their knowledge unless there was an alternate source that the German's could ascribe the leak to.

Why didn't they want the Germans to know that they knew the system? Because the Germans would have changed it! And it was always a huge challenge to break the new code. The Enigma actually changed many times from when the first cipher machine, Enigma A, came on the market in 1923. The early work at cryptanalysis (to "break the code" of the Enigma) was done in Poland, and then during the war was centered in a very secret English organization that employed 7000 people at its peak.

Test, please.

On quite a few of my projects, I need people other than myself to test it. Projects that are more complex than simply displaying a list of phone numbers. Perhaps I have not looked in the right places, but I have not found much in the way of methods of getting people to test applications. I'll explain some of the ways I've tried and their result.

Interview: Tom Copeland of PMD & Rubyforge

This is the first in a series of interviews we're making available to the CodeSnipers community. We have been working to track down people who we thought had something valuable to say about the software development community, tools, practices, or direction. Some of the names you will recognize immediately, others you've probably never heard of, but all of them have made an impact in one way or another. Without further delay... our first victim... er... is Tom Copeland of the PMD project.

Some of our community may be familiar with the Java tool PMD, but many are not. Could you tell us a bit about PMD and your role in the project?

Sure! PMD is a utility for finding various "opportunities for improvement" in Java source code. It uses static analysis, meaning that it parses and analyses the source code without actually running the program, to find unused code, unnecessary object creation, and bad practices. You can run it using Ant/Maven/command line/various IDEs and generate text, HTML or XML reports.

Caveman PHP Debugging

Certainly it's possible to have nice debugging tools for php (ZEND has one), unfortunately I have not had the experience of any so I will tell you how I debug php - caveman style!

Poke it with a stick
Sometimes, you aren’t sure if the code is getting to a certain line. For quick checks I put

print "here";

That’s fine until I get interrupted or go home only to come back the next day and wonder, huh? where did I put that line??

Phantom Currency Signs in Japan and Korea

If you're not from Japan or Korea, you might be surprised that when you reboot your Windows OS in the Japanese language for non-Unicode programs (system locale) your backslashes are no longer backslashes; they are yen signs. Well, don't worry, they are still backslashs, they are just displayed and printed differently by many of the fonts in the Japanese locale. But there is a more troubling internationalization issue: Unicode text coming out of Japan and Korea sources may have a backslash where you would expect a yen sign ¥ or won sign ₩. This whole subject has been discussed elsewhere (references below) but I talk about the need to repair the Unicode text.

Where the backslash issue gets interesting is in the encoding conversion between the locale code pages and Unicode. While 0x5c is clearly the yen sign in the Japanese code page 932 (Shift-JIS), it is converted to the Unicode U+005c REVERSE SOLIDUS (backslash) rather than the U+00a5 YEN SIGN. Similarly in the Korean code page 936, 0x5c is clearly the won sign but it is converted to the Unicode backslash rather than the U+20a9 WON SIGN.

Importing Legacy Data

I came across a great post recently about Legacy Data: Import Early, Import Often and it really struck a chord.

The author is completely correct that importing old data is normally considered a last step in the process of a implementing a new project. Most developers love to start with a fresh clean codebase, whiteboard, database, etc and build their projects from the ground up. It is a wonderful feeling starting with a blank slate and actually making something from nothing. Generally, it is much less satisfying to take a (mostly) functional codebase, learn about it, dig through its oddities, and expand or fix the features. I've talked about this tendency before in Scrapping It All vs A Salvage Operation, but I thought it needed some expansion.

The Euro Sign Predicament

To this day, you will rarely see the euro sign in newsprint or news sites, it will be written out or abbreviated as EUR. This is one of those interesting cases where computer text encoding systems have influenced the way information is presented in the media.

The ASCII factor

One reason the symbol is not used is that it is not ASCII, and whenever you depart from ASCII you risk ending up with corrupted characters. Similarly, the pound and yen appear written out rather than using the £ and ¥ symbols, although those have different reasons too, and longer histories.