Choose the Right Tool for the job

Over the years, I've worked with a number of systems which aggregate data in some way. These have included simple RSS aggregators, the importation of multiple XML formats, and even interacting with a multiple of bug tracking systems. Regardless of the domain, there is a simple concept here: There are a particular set of actions we wish to perform on the data, but quite often the data sources have a variety of data structures.

For some reason, most people immediately jump to the "simple" solution of building a inheritance heirarchy, each with a distinct class/structure beneath it to a specific data structure. This can work fine for quite a few situations and ends up being quite elegant in patterns such as the Data Access Object and others where you can completely encapsulate the data structure and simply forget about it. Unfortunately, this doesn't work in all scenarios.

For example, I'm involved in a large scale Java system where we are retriving XML data structures from a series of different sources (think aggregation). When we started there were twelve different sources in two different formats and the solution looked simple, we would simply import each of the structures and be done with it.

Since I've been bitten by this before, I proposed a different solution. Instead of doing the import in a single step, we would use a Two-Step Pattern to first convert each data structure into our custom structure and simply import that. We've been able to implement this by building a single importer class and creating a new XML-to-XML XSL transform for each new data source.

Just in time too, within 2 months, we were up to 7 different formats from 237 different sources...