Establishing a Baseline

Whenever my company receives a new project, many times we are requested to establish a baseline set of data, which is used as a comparison to make sure something isn't amiss with our setup.

The customer can use this information to draw a conclusion that we are either matching their own data well (meaning our setup must be fairly similiar to theirs), or we are totally off and need to find the problem.

I can tell you that the latter is no fun at all. But it's even LESS fun to get three months into a project only to find that one very small installation error has completely ruined all of the generated data.

In software testing, it's important to start with a baseline. Pick some basic functionality that you know works and write tests around it. Establish your baseline of tests around the basic, known criteria. When that's proven to work, begin branching out with the finer grained test points.

Your baseline should not be all-encompassing. It should, however, encompass a good spread of the points you are trying to achieve. Too often, our customers want to use the baseline data time as a place to capture more than just baseline data. They say things like: "oh, since we're taking data anyway, can you go ahead and do X, Y, and Z for me at the same time. I need that information anyway for some other work I'm doing". This is bad news, because it takes the effort off of the baseline work and puts it back into a "overall data capture" mode.

Recently, when attempting to acquire some baseline data for a customer, they requested we acquire around 200 data points for their analysis. From experience with this particular application, I can tell you that 20 data points would have been more appropriate. Each data point has around 100 variables in it. With 20, it's easy to put each data point and its comparison point into a spreadsheet and look at the results side by side. With 200, it's much more difficult to get a general overview of "yes this data matches" or "no it doesn't".

What happened? After spending 10 times the amount of time to acquire the baseline data, the person responsible for making the analysis came back and said that they didn't really have time to look at all of the data, but what they did have a chance to look at looked okay and we can proceed with the next step of development.

I wish we could have saved them the time. The unfortunate reality is that the person in charge of the project as a whole AND the person in charge of okaying the baseline data are two different people with two different agendas. The person in charge of the project assigned the baseline data person to that role because of their familiarity with the particular data we were obtaining. But, people who crunch data aren't typically people who do testing - instead they are people who want as much data as possible to do the crunching with.

So remember, when performing your software testing, establish a baseline and branch out after you (or your customer) are happy with the results. Don't use it as a time to generate lots and lots of data (do that AFTER the baseline). By keeping this in mind, it will give you the opportunity to make sure everyone is happy without wasting extra effort.