C / C++

Source Code Browsing Tools?

Marco Sanvido asks: "I often look at source code (especially C, but this question is valid for other languages as well) and I have a really hard time in understanding how it works. Documentation is often missing or quite outdated, and the only way to see how the program works is to try to understand the source code. Which tools do you prefer to use for browsing and studying source code? So far I have used LXR for Linux, Eclipse for java, and CScope, but I'm not sure that these tools are the best solution." It's tempting to flood this question with answers for your IDE, but the key thing here is _browsing_, not _development_. What decent, lightweight programs would work well as source code viewers?

Strings are a Domain-Specific Language

Question: Isn't a domain-specific language just the same thing as a library? (Source: Pretty much everyone the first time they hear of DSLs.)

Answer: No, a DSL is much more than a library, and I have an example that won't make you say, "Well, sure, if you're doing something that esoteric..."

Sharing my cool toys

I was berated the other day by Keith. He told me about PLEAC and I said yeah, I know!! He said no fair you didn't share your cool toys! So for all you remaining coders out there, I'm sharing! Here's a few handy code snippet sites and I'll review for you today.

PLEAC
This site uses the Perl Cookbook as the basis (which has the Perl source freely available) and volunteers rewrite the snippets in other languages where possible. Very handy, if you know one language and wonder how you would do it in another language.

Emptiness in COleDateTime Format VAR_TIMEVALUEONLY

Funny how you can have trouble using a function, and search the web looking for an answer to no avail, and finally figure out how to do it only to discover that if you had read between the lines in the documentation you would have gotten it in the first place. Well, it is half true here. There turned out to be two big underlying issues, one of which was hinted at in the documentation and the other goes back to a problem I've talked about before. What's amazing about this one is that I thought I wouldn't have much to write about but I kept peeling back layers to find worse stuff.

The MFC COleDateTime class has a Format method which returns a string with the formatted date/time value. As explained in the MSDN documentation this method is overloaded so you can pass a strftime style format string or you can use:

The Enigma of Encoding Versions

The Enigma Machine was used to encrypt wireless messages by the German regime before and during the Second World War. It was very significant in the Allied victory that they were able not only to decipher the German Enigma's encryption, but to mostly keep it a secret that they had deciphered it. Many soldiers and ships were sacrificed to keep it a secret because the Allies did not want to act on their knowledge unless there was an alternate source that the German's could ascribe the leak to.

Why didn't they want the Germans to know that they knew the system? Because the Germans would have changed it! And it was always a huge challenge to break the new code. The Enigma actually changed many times from when the first cipher machine, Enigma A, came on the market in 1923. The early work at cryptanalysis (to "break the code" of the Enigma) was done in Poland, and then during the war was centered in a very secret English organization that employed 7000 people at its peak.

CDATA Section Delimitosis

Delimitosis = disease pertaining to delimiter

I don't know if it is just because I am a parser-minded person, but the first time I learned about CDATA Sections a warning buzzer went off in my head and has been ringing ever since. It is saying: What if ]]> happens to be in the data you put into a CDATA Section?

Well obviously it is not allowed. Hmmm. But that is not very helpful is it? Does than mean I am supposed to check to see if my text contains ]]> every time I want to use a CDATA Section? And what should I do if it does?

I want to settle some of the unsettling issues about CDATA Sections here.

How I Invented Base64

Base64 is a way of storing any data as plain ASCII text. It looks like this:

LZPVtzlndhYFJQIDAQABMA0GCSqGSIb3DQEBAgUAA1kACKr0PqphJYw1j+YPtcIq
iWlFPuN5jJ79Khfg7ASFxskYkEMjRNZV/HZDZQEhtVaU7Jxfzs2wfX5byMp2X3U/
5XUXGx7qusDgHQGs7Jk9W8CW1fuSWUgN4w==

Look familiar? You'll see it in your e-mail source when your e-mail has attachments. How did I invent it? Well I didn't really, but before I knew base64 I came up with an encoding system I called "6-bit rollover" that turned out to be nearly identical to base64. It turns out that was not a momentous achievement because the beauty of base64 is how natural and simple it is. Here I am going to show how sensible base64 is by describing my discovery process, and giving you the quick round-up of everything you need to know to use base64.

EBCDIC to ASCII (and SBCS) Conversion

The first task I had when I got a C programming job in 1991 straight out of college was a small two week project writing a program to convert EBCDIC to ASCII. The software company I joined had about 10 employees and a consultant named Sam. The owner of our company wanted to do this cheaply as a favor to the customer and hoping for a bigger contract down the road, so I think mostly only my hours were charged on the contract even though it was really just Sam mentoring me. Sam took me over to meet the customers at their site and ask some more questions about the data we were converting. My memory is that they were very nice but they could not give us any more information or sample data!

It can be hard to figure out the encoding (and the variant of the encoding) but once you get the mapping right implementing the conversion efficiently is easy for single byte character sets. Here I take the EBCDIC to ASCII example through these stages and finish by trying to emphasize that it is a crying shame when charset conversion is not extremely fast.

That Ol' OEM Code Page

If you have a regular U.S. or Western European (Windows-1252) system locale code page, try this:

  1. copy and paste these 4 characters ÂÄÒÙ into notepad
  2. save it as oem.txt
  3. open a DOS window and cd to the directory where you saved it
  4. enter: type oem.txt

Do you see ┬─╥┘ instead of ÂÄÒÙ? Why? That is because your system local "ANSI" code page is different than your DOS "OEM" code page.

Fonts To Simulate Charsets

I want to know as little about fonts as I can get away with, but I recently saw a modified font being used to obtain DOS-style box drawing characters like ┬──┘. The text was in an old PC Code Page, probably IBM437 also called "OEM United States." So rather than convert it to Unicode, it was left in the IBM437 single byte encoding and viewed with a font that replaced certain characters with box drawing characters.

Lets take the character and follow it through this process. The byte value d9 (217) represents this character in IBM437. When the browser or the OS treats this byte as if it were in Windows-1252, it converts it to Unicode U+00d9 LATIN CAPITAL LETTER U WITH GRAVE and it would normally be displayed as Ù. But in our special hacked font, the character mapped to $00d9 actually looks like the box drawing character .