Importing Legacy Data
Submitted by Keith Casey on Wed, 2005-10-12 14:32.
I came across a great post recently about Legacy Data: Import Early, Import Often and it really struck a chord.
The author is completely correct that importing old data is normally considered a last step in the process of a implementing a new project. Most developers love to start with a fresh clean codebase, whiteboard, database, etc and build their projects from the ground up. It is a wonderful feeling starting with a blank slate and actually making something from nothing. Generally, it is much less satisfying to take a (mostly) functional codebase, learn about it, dig through its oddities, and expand or fix the features. I've talked about this tendency before in Scrapping It All vs A Salvage Operation, but I thought it needed some expansion.
On a large project for one of our current customers, we have a huge amount of data coming into the database on regular intervals. Although we're not going directly from database to database - there is a transmission format in between - it is the same concept. The sources of data are a series of seven different repositories each with a different data structure. The incoming formats are mostly fixed, but tend to shift steadily over time.
Regardless, the first thing that we did was prove our proof of concept. The data is received in the various XML formats, transformed via XSL to an internal standard and then imported. Depending on the source, a different transform process is applied but the output is always the same. The mapping done was highly detailed and there have been some tweaks required, but this was our FIRST step in the process and therefore has been quite solid. It allowed us to work out some oddities in the other processes much earlier and much easier. Also, by confirming the data model and the import code early in the process, we were able to confirm more of the backend processes long before the UI, the user flows, etc were completely developed.
For all of our projects, the first thing we do is attempt to get a hold of the existing data model. Not only does it help with the Import processes, but it can demonstrate little oddities and lessons learned by the previous team. Instead of asking "Why in the World did they do THAT?" learn to ask "What is the benefit of doing that?" This slight change of mind will allow you to miss those development landmines the previous group found for you...